Interaction of Model and Data Complexities
June 4, 2025 - Investing Expertise
Should stock return model complexity guide breadth of input data? In their May 2025 paper entitled “Model Complexity and the Performance of Global Versus Regional Models”, Minghui Chen, Matthias Hanauer and Tobias Kalsbach assess the predictive performance of global versus regional inputs for stock return models based on linear and machine learnings algorithms: ordinary least squares regression (OLS); elastic net (ENET); random forest (RF); gradient-boosted regression trees (GBRT); and, neural networks (NN). Monthly model inputs include 36 firm-level characteristics and associated stock trading data in U.S. dollars for 24 developed market countries, suppressing effects of megacaps and excluding microcaps (the smallest stocks per country comprising 3% of overall market capitalization). They segment country markets into four regions: North America, Europe, Japan and Asia Pacific. Model training employs an expanding window (initially six years, extended year by year), followed by a 6-year validation interval and a 1-year test interval. For each model, each month, they reform a portfolio that is long (short) the fifth, or quintile, of stocks with the highest (lowest) predicted returns. Using the specified monthly firm/stock inputs during July 1990 through December 2021, they find that: