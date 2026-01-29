If you follow investment research, you’ve probably noticed an explosion of “factors” (John Cochrane’s factor zoo) —characteristics like value, momentum, or profitability that purportedly predict stock returns. Academic journals have published hundreds of these strategies, and asset managers have launched products based on them. But the truth is that many of them could be false discoveries. In their December 2026 paper “What Threshold Should be Applied to Tests of Factor Models?” Campbell Harvey, Alessio Sancetta, and Yuqian Zhao demonstrated both that most published investment strategies fail proper statistical tests and proposed a simple fix.

What the Researchers Examined

Harvey, Sancetta, and Zhao tackled a fundamental problem in investment research: when you test hundreds of strategies, some will look good purely by luck. Think about it this way: if you flip a coin 100 times, you’ll occasionally get streaks that look like patterns. Similarly, when researchers test hundreds of factors, some will appear “statistically significant” even when there’s nothing real behind them. The traditional statistical threshold—a t-statistic of 2.0—was designed for testing one hypothesis, not hundreds.

The researchers examined this problem from multiple angles:

They analyzed a massive dataset of 29,314 randomly generated trading strategies.

They scrutinized 153 published factors from academic research.

They reconciled conflicting claims in the literature about whether there’s a “replication crisis” in finance.

Their data sample covered the period from July 1952 to December 2022.

The Key Controversy

The finance profession is deeply divided on this issue. Some researchers argue that p-values need dramatic adjustment—suggesting t-statistic cutoffs of 3.0 or higher. Others claim that traditional thresholds remain valid and that most published discoveries are genuine.

The stakes are high. If the skeptics are right, billions of dollars have been invested based on false patterns. If the optimists are right, valuable investment strategies are being dismissed.

What They Found: Three Critical Problems

1. Correlation Ruins Everything

Most statistical methods assume factors are independent. However, investment factors are highly correlated. The researchers show this correlation causes the distribution of test statistics to shift and spread out. What looks like a significant t-statistic of 2.5 might actually be quite common under the null hypothesis when you account for correlation. Ignoring this leads to massive over-discovery.

In their analysis of the 29,314 factor dataset they found that assuming independence suggested 41% of factors were real. However, accounting for correlation reduced this to just 5%.

2. The Bootstrap Doesn’t Save You

Some researchers have claimed that bootstrap methods solve the correlation problem. However, the authors demonstrated that while bootstrapping on average produces correct distributions, any single sample (like the one researchers observe) can be heavily distorted by correlation. The bootstrap masks this problem by averaging across many simulations—but real researchers only see one realization of the data.

3. Sample Selection Bias Is Real

Published factors aren’t randomly selected—they’re the “winners” from countless tests. This creates severe selection bias. The researchers show that methods claiming to account for this often fail, particularly the influential Bayesian approach (see here).

The Solution: A New Threshold

The paper’s practical contribution is simple: use a t-statistic cutoff of at least 3.0, and preferably 3.4 or higher. This recommendation comes from a clever innovation called the lower bound on the Local False Discovery Rate—a statistical measure that estimates the probability that a particular hypothesis is a false discovery, given its test statistic value. Unlike traditional methods that require knowing how many tests were run (usually unknowable), this approach:

Works without specifying the total number of tests.

Provides the probability that a specific result is false.

Requires only mild assumptions about the alternative hypothesis.

The researchers provided a simple table showing required thresholds for different scenarios. For example, if you believe 90% of tested factors are noise (a reasonable assumption), you need:

t > 3.16 to have at most 20% chance the factor is false.

t > 3.41 to have at most 10% chance the factor is false.

t > 3.64 to have at most 5% chance the factor is false.

External Validation

The paper’s findings align with real-world evidence:

ETF performance: Studies show that factor-based ETFs with promising back tests often deliver null returns in live trading. Post-publication decay: Factor returns typically shrink dramatically, or even vanish, after publication. Professional practice: Major asset managers like Avantis, Bridgeway, and Dimensional focus on just a few core factors, not hundreds.

Key Investor Takeaways