A reader asked: “I would really appreciate your review of “S&P 500 Returns Revisited”. It seems crazy…but crazy enough to work?” This March 2010 paper, one of 46 currently in the Social Science Research Network by one or both of Ivan Kitov and Oleg Kitov, presents an abstract as follows:

“The predictions of the S&P 500 returns made in 2007 have been tested and the underlying models amended. The period between 2003 and 2008 should be described by the dependence of the S&P 500 stock market index on real GDP because the population pyramid was highly inaccurate. The 2008 trough and 2009 rally are well predicted by the original model, however. The rally will end in March/April 2010 and the S&P 500 level will be decreasing into 2011. This prediction should validate the model.”

This paper aims to predict the behavior of the S&P 500 Index based on very specific short-term demographic variations, purported to indicate changes in economic activity as measured by real Gross Domestic Product (GDP) via GDP per capita. The abstract states that the demographic model does not work well between 2003 and 2008 (because of inaccurate population data), but that a correct prediction from this model for the balance of 2010 into 2011 “should validate the model.” *Some observations about this study are:*

Page 1: “…when the trajectory of real (and nominal) economic growth is exactly known one can also predict aggregate stock market indices.”

The simple tests in “GDP Growth and Stock Market Returns” do not support belief in this assertion. Different predictability specifications, more complex tests or clarifications of “exactly” may yield some support.

Page 2: “Third source of real GDP growth is responsible for short-term fluctuations around the trend… This source is related to the change in a specific age population. In the United States this age is nine years as well as in the UK. European countries and Japan are characterized by the age of eighteen years.”

The fact that changes in populations of quite different ages relate most strongly to real GDP growth in different countries suggests data snooping. Cultural differences are possible in terms of what age (if any) drives consumption and productive capacity of a country. It is not obvious that any particular age should be key. It is less obvious that, if there is a key age, it should always be nine for a particular country.

Page 3: “Our aim is to describe the S&P 500 returns using monthly estimates of the nine-year-old population in the USA. Due to the problems with the overall consistency the monthly population estimates were smoothed over neighboring calendar months and over the same months of adjacent calendar years. …the best representation of the monthly population estimates may vary between the age groups.”

The use of complex smoothing rules, especially rules that may vary be age, suggests data snooping.

Page 4: “…the S&P 500 returns are represented by a running sum of monthly returns over twelve consecutive months. The monthly returns are calculated from the closing levels of the S&P 500 index. A natural time step is one month. The summation allows obtaining a smoother curve than that provided by the annual S&P 500 returns with one month step.”

The smoothing here suggests data snooping and also complicates evaluation of any trading strategy based on the model presented.

Pages 4-5: “The population estimates are smoothed according to the following procedure. The trial-and-error method demonstrated that the best monthly estimates of the number of nine-year-olds for our purposes are those averaged over five adjacent years of age, with the nine-years-olds in the center. For example, the number of nine-year-olds in June 1995…is estimated as the average value of 7-, 8-, 9-, 10-, and 11-year-olds in June 1995… The introduced approximation of the number of nine-year-olds…is justified also by the excellent prediction of the S&P 500 returns for the period where the monthly estimates are available, except the years after 2003.”

The use of trial and error to find the “best” estimates suggests data snooping, and the exception for years after 2003 undermines belief in the key relationship so derived.

Page 6: “After April 2000, only the postcensal population estimates are available. There are several vintages of these estimates available for previous years, however, which use most recent information on the past estimates of the rate of deaths and migration. Therefore, no postcensal estimate is final and further revisions are likely for all monthly estimates.”

This discussion appears to indicate that there are quality issues (substantive revisions) in the demographic data and that there may be look-ahead bias in some of the backtesting.

Pages 7-8: “There is a good opportunity, however, to obtain a relatively accurate prediction of the returns at time horizons between one and nine years using population estimates for younger ages as a proxy to the number of nine-year-olds.”

Long forecast intervals squeeze the sample available to test relationships. For example, backtesting a nine-year forecast would currently use independent variables available only through 2000.

Pages 8-9: “…the averaging over five adjacent ages has been tested for the monthly estimates of older ages. Unfortunately, it gave results inferior to those for the period between 1990 and 2003. …We have tried several alternative smoothing techniques. The best one is based on the averaging of neighboring months for the same age.”

Trying “several alternative smoothing techniques” suggests data snooping.

Pages 10-11: “The period of such an excellent description finished in April 2003, however, after a comprehensive revision to population estimates of the 2000 census. We expect that the next comprehensive revision in 2013 (after the 2010 census) will provide more accurate population estimates for the period between 2003 and 2013.”

This discussion appears to state that demographic data quality is an issue and that look-ahead bias is important to backtest accuracy. This explanation leaves unclear why (per the abstract) the demographic model started working again for 2008-2009.

Page 14: “…we failed to predict the S&P 500 return beyond 2003 [based on estimated changes in the population of age nine]. …it is reasonable to assume that the 9-year-old population was not well estimated by the US Census Bureau after 2003.”

These points indicate a demographic data quality issue critical to the model, or a problem with the model.

Page 15: “There is a concern related to the accuracy of population and real GDP measurement in 2006. …the predicted [GDP] curve fell to -0.075 in the third quarter of 2006. There was no significant decrease in the S&P 500 returns during the same period. A possible reason for the discrepancy is that the real GDP was underestimated. This issue should be resolved in the next comprehensive revision to the GDP.”

This discussion indicates that the alternate forecasting variable (GDP per capita instead of population of age nine, used because the demographic model does not work between 2003 and -2008) does not work in 2006 and that look-ahead bias may be important for making the backtest work.

Page 15: “The number of nine-year-olds, in one form or another, demonstrates the predictive power far beyond that of the naive model.”

The qualification “in one form or another” suggests data snooping by shifting from one form to another form when the first form does not work.

Page 16: “We have to admit that the period between 2003 and 2008 is not well described when the monthly population estimates [for age nine] are used. …At the same time the number of three-year-olds…as reported by the Census Bureau, might be a useful predictor for the period after 2008.”

This discussion indicates a tendency to data snooping.

The variable construction and smoothing techniques used in the study point to explicit or implicit measurement/estimation intervals on the order of a year, indicating that many of the backtest periods in the paper are short. Small samples tend to elevate the import of data snooping bias.

The study does not test any ways to exploit the stock market predictability it claims to find.

In summary, *potential issues regarding sample size, data quality/consistency, data snooping bias and look-ahead bias undermine belief in this study’s conclusion that changes in the population for age nine in the U.S. accurately predict U.S. stock market returns. Also, smoothing rules applied in stock index variable construction obscure exploitability.*