It Can’t All Be Data Snooping?

December 7, 2018 • Posted in Big Ideas

Is it possible that all the 300+ published factors that predict stock returns (such as size, value, profitability, investment, momentum…) derive from data snooping? In his October 2018 paper entitled “The Limits of Data Mining: A Thought Experiment”, Andrew Chen estimates how much data snooping would be required to “discover” all these factors by pure luck. Specifically, he calibrates a pure luck model built on the assumption that the probability of publishing a factor discovery increases with the degree to which the discovery is convincing (t-statistic). Using this model, he estimates the number of unpublished factor studies required for the published set to be attributable to pure luck. He considers two sets of factor t-statistics: 156 from factor replications via equal-weighted long-short extreme fifths (quintiles) of factor stock sorts; and, a hand-collected set of 316 from published factor studies. Using the specified approach and these two sets of t-statistics, he finds that: (more…)

