Objective research and reviews to aid investing decisions
Reader Richard Beddard, editor of Interactive Investor, flagged a series of three studies by Keith Anderson and Chris Brooks on approaches to enhancing the value premium via empirical analysis of the price-earnings ratio (P/E) calculated with lagged earnings. One study seeks to optimize value indication based on the extent and weighting of historical earnings used in the P/E calculation. The second study seeks to concentrate the value premium by decomposing P/E into components related to market, firm size, industry and company-specific factors. The third study combines the findings of the first two and examines the returns for the extreme tails of the enhanced P/E distribution. All three studies use earnings and stock return data for a broad range of UK companies (excluding the smallest) for the period 1975-2004. Summaries of the three studies follow.
In the May 2005 paper entitled "The Long-Term Price-Earnings Ratio", the authors seek to maximize the value premium indicated by P/E based on the amount and weighting of historical earnings applied in calculating it. They find that:
The following chart, taken from the paper, shows the cumulative value on a log scale during 1975-2004 of initial £1,000 investments in the 10% of stocks with the lowest P/Es (value) and the 10% with the highest P/Es (glamour, or growth), rebalanced annually. Value based on P/Es calculated with last year's earnings plus those from eight years ago (EP1+EPM8) outperforms value based on P/Es calculated with just last year's earnings (EP1). And, glamour defined based on (EP1+EPM8) underperforms glamour based on EP1 alone. Over 29 years, the (EP1+EPM8) value portfolio is worth almost twice the EP1 value portfolio, while the (EP1+EPM8) glamour portfolio is worth only two-thirds the EP1 glamour portfolio. The average annual value premium is 12.7% for (EP1+EPM8) versus about 6% for EP1.

In summary, careful parsing of earnings history can enhance the use of historical P/E as an indicator of the value premium.
In the May 2005 paper entitled "Decomposing the Price-Earnings Ratio", the authors test the value premium discrimination power of an adjusted P/E derived from: (1) the contemporaneous market P/E (reflecting shifts in aggregate investor confidence); (2) firm size (small companies tend to have lower P/Es); (3) the industry in which the company operates (there is wide variation of average P/Es across industries); and, (4) idiosyncratic effects of unique company information. They find that:
The following chart, taken from the paper, illustrates the value-discrimination power of adjusted P/E in terms of Sharpe ratio (using the 3-month Treasury bill yield as the risk-free rate). The Sharpe ratio for the 10% of stocks with the lowest adjusted P/Es (value) is about four times that of the 10% with the highest adjusted P/Es (glamour, or growth), and Sharpe ratios generally increase as adjusted P/E decreases. Although the variability of returns is somewhat higher for low P/E stocks, the standard deviation does not rise as quickly as the returns.

In summary, adjusting P/E to account for market, industry and style factors enhances its power as a value indicator.
In the April 2005 paper entitled "Extreme Returns from Extreme Value Stocks: Enhancing the Value Premium", the authors extend the above findings to test whether the very best (worst) returns are concentrated among a very few stocks with the very lowest (highest) P/Es. For this analysis, they use eight years of company earnings history to measure long-term earnings power. Also, they strip away the predictable influences of the overall market, firm size and industry to focus on the idiosyncratic (firm-specific) component of each company’s P/E. They find that:
The following chart, taken from the paper, shows the cumulative value on a log scale during 1975-2004 of initial £1,000 investments in: (1) the six extreme value stocks, rebalanced annually; (2) the six extreme glamour stocks, rebalanced every eight years; (3) the arbitrage portfolio (+/- £1,000) that is long the extreme value stocks and short the extreme glamour stocks; and, (4) an equally-weighted market average for all UK stocks with eight years of positive earnings.
The standard deviations of the three extreme portfolios are similar. The arbitrage portfolio does not hedge against overall market shocks.

In summary, an investment strategy based on extreme P/Es calculated with an extended earnings history and concentrated by stripping out market, industry and style factors offers extreme returns.
We wonder whether a formal derating of indicated returns due to data mining bias is in order, as sketched in our blog entry of 12/11/06, which reviews David Aronson's book Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Also, might P/Es including earnings forecasts outperform as value indicators those based only on historical earnings?
For additional summary-level information on these studies, see Dr. Anderson's "Improving the P/E" web pages.
For related research, see Blog Synthesis: The Value Premium.
A reader who is a strategist at a European equity hedge fund makes the following two observations:
1. "In 'The Long-Term Price-Earnings Ratio', the authors conclude that low P/E companies are superior performers, but they exclude from their sample all companies that have any negative earnings between 1975 and 2003 (see page 9 of the paper). It is no surprise to see that within this biased set of 'survivors,' low P/E companies turn out to have been good buys."
2. "In 'Decomposing the Price-Earnings Ratio', the authors present only in-sample tests. They acknowledge the issue, as follows: 'These results can fairly be criticised as suffering from a look-ahead bias, in that the regression weights could only have been known in May 2004, but we use them to calculate annual returns for the whole dataset from 1975. We used a rolling ten-year sub-sample to check whether the results would be affected by the use of trailing windows of historical data to calculate the regression weights. We found that the returns are slightly degraded, but since the impact is not marked, to avoid repetition we do not report these results.' It seems that these rolling ten-year checks are not pure out-of-sample tests."
Regarding the first criticism:
On page 9 of 'The Long-Term Price-Earnings Ratio', the authors state that: "…the number of companies for each EPn calculation gradually reduces, as the EPS figure becomes unavailable for years further into the past, from 40,000 initially, to 16,000 that have a full eight years of positive earnings history." In other words, for any given year, the companies in the sample have at least eight years (not 29 years) of positive earnings. On page 13, they state: "In this section, we used only the 16,000-company/year returns that have positive normalized earnings for each of the past eight years." So, for example, a company in the sample at the beginning of 1995 has positive earnings since at least as far back as 1987. If that company has negative earnings in 1995, it would not be in the selection sample for 1996 (but could continue to penalize a portfolio formed at the beginning of 1995).
However, it does appear that survivorship bias comes into play when they use the broader sample to test the predictive power of last year's earnings against that of earnings from prior years. In effect, they may be measuring the effect of earnings consistency or earnings cycles rather than earnings per se. In other words, it may be more important to restrict consideration to companies with at least eight years of positive earnings than it is to focus on calculating P/E based on earnings from eight years ago. However, it is then puzzling that earnings from two through five years in the past do not show a positive forecasting contribution due to earnings consistency.
The authors restrict both of the other papers to the 16,000 company/year sample. Anybody seeking to replicate their results would similarly have to restrict consideration to companies with at least eight years of past positive earnings.
Regarding the second criticism:
The limited (and unreported) out-of-sample testing relates to the idea of more formal accounting for data mining bias noted at the end of the original entry above. Out-of-sample testing is increasingly important as the risk of data mining bias grows.
The checks based on a rolling ten-year history to calculate alternative regression weights seems weak. Moreover, the use of a ten-year window in the one study, and the use of an eight-year rolling earnings history throughout the three studies, implies an independent sample much smaller than 29 (1975-2004), which is already small. As they note, they are unable to construct a larger study.
U.S. data may support a more extensive study, or at least a reasonably robust out-of-sample test.