statistics - Concepts

Explore concepts tagged with "statistics"

Total concepts: 54

Concepts

Texas Sharpshooter Fallacy - A logical fallacy where differences in data are ignored while similarities are overemphasized, like shooting a barn and then drawing targets around the bullet holes.
Insensitivity to Sample Size - The cognitive bias where people fail to adequately account for sample size when assessing the reliability of statistical information, treating small and large samples as equally informative.
Long Tail Distribution - A distribution where many low-frequency items collectively represent significant aggregate value.
Hot-Hand Fallacy - Believing that a person who has experienced success has a greater chance of further success.
Effect Size - A measure of the magnitude or practical importance of a finding, independent of sample size.
Six Sigma - A data-driven methodology for eliminating defects and reducing process variation to achieve near-perfect quality.
File Drawer Problem - The tendency for studies with null or negative results to remain unpublished in researchers' file drawers, creating a systematically incomplete evidence base.
Illusory Correlation - Perceiving a relationship between variables when none exists.
Statistical Thinking - The habit of reasoning about the world through probabilities, distributions, and variation rather than deterministic cause-and-effect narratives.
Stochastic Processes - Mathematical models describing collections of random variables that evolve over time, used to model uncertainty in systems from finance to physics.
Replication Crisis - The widespread failure of scientific studies to reproduce their original findings when repeated by other researchers.
Monte Carlo Methods - Computational algorithms that use repeated random sampling to estimate numerical results, model complex systems, and solve problems that are deterministically intractable.
Failure Rate - The proportion of attempts that result in failure, used to calibrate expectations and strategies.
Causal Inference - The process of determining whether and how one variable or event actually causes changes in another, going beyond mere correlation.
Publication Bias - The tendency for research with positive or statistically significant results to be published more often than studies with null or negative findings, distorting the evidence base.
Sample Size - The number of observations in a study, critical for the reliability and precision of findings.
Law of Large Numbers - The principle that averages of random samples converge to expected values as sample size increases.
Fat Tails - Probability distributions where extreme events occur more frequently than normal distributions predict.
Bayesian Decision Theory - A normative framework for making optimal decisions under uncertainty by combining prior beliefs with observed evidence through probability theory and utility functions.
Central Limit Theorem - The principle that averages of random samples tend toward normal distribution regardless of underlying distribution.
Differential Privacy - Mathematical framework providing provable privacy guarantees by adding calibrated noise to data or query results
Reference Class Forecasting - An estimation method that bases predictions on actual outcomes of similar past projects rather than the specifics of the current plan.
Regression to the Mean - Extreme outcomes tend to be followed by more moderate ones.
Correlation vs Causation - The critical distinction between two things occurring together and one actually causing the other.
Bayes' Theorem - A mathematical framework for updating beliefs based on new evidence.
Statistical Inference - The process of using data analysis and probability theory to draw conclusions about a population from a sample.
Price's Law - The square root of the number of contributors to a field produce roughly 50% of the total output.
Small Sample Fallacy - The error of drawing strong conclusions from insufficient data.
Three-Point Estimation - An estimation technique that uses optimistic, most likely, and pessimistic values to calculate a weighted expected effort.
Zipf's Law - An empirical law stating that the frequency of any item is inversely proportional to its rank in the frequency table.
Representativeness Heuristic - Judging probability by similarity to prototypes rather than by actual statistical likelihood.
Markov Chains - Mathematical systems that model sequences of events where the probability of each event depends only on the state of the previous event, not the full history.
Type I and Type II Errors - False positives (detecting an effect that isn't there) and false negatives (missing an effect that exists).
Simpson's Paradox - A phenomenon where trends in aggregated data reverse when data is separated into subgroups.
Base Rate Neglect - The tendency to ignore general statistical information in favor of specific case details when making judgments.
KL Divergence - An asymmetric measure of how much one probability distribution differs from a reference distribution, foundational to information theory and modern machine learning.
Standard Deviation - A measure of how spread out values are from the mean.
Normal Distribution - The bell curve pattern where most values cluster around the mean with symmetric tails.
Signal vs Noise - Distinguishing meaningful patterns from random variation or irrelevant information.
Variance - A measure of the spread of values, calculated as the average squared deviation from the mean.
Base Rate - The underlying probability of an event before considering specific evidence or conditions.
Mutual Information - A measure of how much knowing one random variable reduces uncertainty about another, capturing the strength of any relationship — linear or not — between them.
Look-Elsewhere Effect - Statistical phenomenon where random fluctuations appear significant when examining many possibilities or locations in data.
Statistical Distributions - Mathematical functions describing the probability of different outcomes, forming the foundation of statistical analysis and decision-making.
Power Law - A statistical distribution where small occurrences are extremely common and large occurrences extremely rare.
Regression Fallacy - The error of attributing a natural regression to the mean to a specific cause, mistaking statistical inevitability for the effect of an intervention.
Confidence Interval - A range of values that likely contains the true population parameter with a specified probability.
Statistical Significance - A measure of whether observed results are likely due to chance or represent a real effect.
Mean, Median, and Mode - Three different measures of central tendency, each useful in different contexts.
Ergodicity - Whether time averages equal ensemble averages - a crucial distinction for risk and decision-making.
Clustering Illusion - Seeing patterns in random data, such as 'hot streaks' in random sequences.
Selection Bias - Distortion in analysis caused by non-random sampling or systematic exclusion of data.
Random Walk - A mathematical model describing a path consisting of successive random steps, used to model stock prices, particle diffusion, and many natural and social phenomena.
Luck vs Skill - The challenge of distinguishing genuine ability from random variation in outcomes, critical for accurate performance evaluation and learning.

← Back to all concepts