Objective research to aid investing decisions
Value Allocations for September 2019 (Final)
Momentum Allocations for September 2019 (Final)
1st ETF 2nd ETF 3rd ETF

Chapter 2: Making the Strategy Logical

Making an investment/trading strategy logical essentially means making it testable and implementable, with inputs, outputs and rules clearly defined, properly sequenced and inclusive of all material factors. Clearly defined inputs, outputs and rules enable verification and extension. Definitions that require subjective interpretation are not clear. Properly sequenced inputs, outputs and rules fit the real world, representing an analysis and implementation scenario available to an investor in real time. Some strategies are more forgiving of tight sequencing than others. Including all material factors means accounting for all significant contributions to (capital gains, dividends, interest) and debits from (costs of data, trading frictions, cost of shorting, cost of leverage) investment outcome. The materiality of factors varies with strategy specifics.

How can investors make sure their strategies are logical?

2.1 Timeline Logic

Figure 2-1a illustrates a logical sequence of events from input measurement through signal execution applicable to basic strategies. The elements of the sequence are:

  • First is the independent variable(s) measurement interval. The independent variable may be an economic indicator (such as the inflation rate), sentiment indicator (such as an investor poll), technical market indicator (such as a moving average) or some other variable hypothesized to predict returns for some asset.
    • The measurement interval may be close to zero, as for a technical indicator calculated with real-time asset prices.
    • If the variable comes from another source, such as the government for economic indicators and firms for accounting data, the investor has no control over the smallest measurement interval.
    • As discussed in Chapter 1, the measurement interval and the measurement frequency may differ. For example, investors typically measures 200-day moving averages daily and 10-month moving averages monthly.
    • For complex strategies, there may be multiple independent variables with different measurement intervals.
  • Next is the independent variable release delay. Generally, investors cannot trade on an indicator until after it is publicly available.
    • The release delay may be close to zero, as for a technical indicator calculated with real-time asset prices.
    • The delay for economic indicators may be days, weeks or months. For example, the U.S. government typically releases the Consumer Price Index for a month about two weeks into the next month.
    • The delay for firm fundamentals (such as assets, earnings and cash flow) is typically one to two months. The delay for aggregated index fundamentals is around three months.
  • With the input(s) available, next is data collection and processing to generate a signal.
    • The investor can make this interval short by a combination of strategy simplicity and automation.
    • If the investor can reliably estimate the signal with early data (such as price data five minutes before the market close as an estimate of closing value), the investor may be able to assume zero collection/processing time.
    • Data collection and processing time may be significant for strategies involving data for thousands of assets and multiple inputs (such as returns, volatilities and correlations).
  • Next is signal execution delay to accommodate order entry. The investor can make this delay very small by keeping orders simple or by automating order generation.
  • Last is the dependent variable (investment return) measurement interval. The investor controls this interval via strategy design.
    • The interval may be of fixed length or of variable length, with exit dependent on an additional, nested signal (such as a stop-loss or price crossing a moving average).
    • As discussed in Chapter 1, the arrival rate of new information is a natural return measurement interval.

Some strategies are very complex, with several input variables, partial entries and exits and multiple concurrent open positions. Such strategies can be very difficult to model at a portfolio level (as discussed further in Chapters 6 and 8).

Figure 2-1a: Simple Strategy Logic


Some considerations in applying the strategy timeline are:

  • Accounting for independent variable release delay is likely most important for widely followed variables (such as employment growth and inflation rate), the release of which may move markets.
  • As noted, one logical dependent variable (return) measurement interval is the measurement frequency of the independent variable.  This frequency represents the arrival rate of new information. For example, if the independent variable is the annual U.S. inflation rate, measured monthly by the Bureau of Labor Statistics (BLS), a logical return measurement interval is one month. When the measurement frequency is less than the measurement interval, measurements overlap and complicate (bias) statistical analysis.
  • Another logical return measurement interval is the independent variable measurement length. For example, for the annual U.S. inflation rate, a logical return measurement interval is one year. For this interval, measurements do not overlap. This setup offers the simplest statistical analysis but may result in small samples, undermining statistical confidence.
  • Another reasonable return measurement interval is news response time, a relatively short interval of a few minutes to a few days to measure the response of investors in aggregate to an announcement, such as the monthly release of new inflation data by BLS.
  • Logic suggests that compressing the data collection/processing time and execution delay and, if possible, independent variable release delay, should be advantageous to signal exploitation. The faster an investor can move, the greater the edge over other investors. As noted, if input data is available in real time and computations are simple, setting these intervals to zero may be a reasonable approximation. Robustness tests that delay execution offer assurance.
  • Return data sources often make it convenient to use measurement intervals with conventional break points like calendar weeks, months and years, but independent variable releases sometimes conflict with these break points. Exact alignment of input releases and return measurement interval starting points can be complicated.

Some studies use the same start and stop points for measurements of the independent variable and the dependent variable (return). These studies look for “explanatory” rather than “predictive” power. They may find strong contemporaneous relationships and present them as insightful, but the relationships are not exploitable.

For example, Figure 2-1b is a scatter plot depicting the relationship between monthly S&P 500 Index return and level of the Chicago Board Options Exchange Market Volatility Index (VIX) at the end of the same month. VIX measures the expected volatility of the stock index over the next month as indicated by index option prices, and investors often refer to it as the “fear index.” Data for both the S&P 500 Index and VIX are from Yahoo!Finance. The best-fit line has a negative slope (a high VIX means a bad return), with coefficient of determination (R2) 0.14, indicating that variation in VIX explains 14% of the variation in S&P 500 Index return. This level of relationship is reasonably strong, but not exploitable because an investor does not know a new level of VIX until the associated return is already realized.

Figure 2-1b: Contemporaneous Relationship between S&P 500 Index Return and VIX


Figure 2-1c is a scatter plot depicting the relationship between monthly S&P 500 Index return and level of VIX at the end of the preceding month (the start of the return measurement interval). With a short execution delay, an investor could exploit this relationship. However, R2 for this setup is 0.00, indicating no relationship between the two variables. While there is an obvious contemporaneous relationship, there is no simple and exploitable VIX-return lead-lag relationship.

Figure 2-1c: Logically Sequenced Relationship between S&P 500 Index Return and VIX


Some studies are imprecise about accounting for independent variable release delay. As an example, Figure 2-3a relates S&P 500 Index return during the next calendar week to weekly change in the Chicago Fed National Financial Conditions Index (NFCI). NFCI measures risk, liquidity and leverage in money markets, debt/equity markets and banking systems for weeks ending Friday. An increase (decrease) in NFCI indicates tightening (loosening) financial conditions. The Federal Reserve Bank of Chicago normally issues NFCI on Wednesday morning at 8:30 a.m. ET (before the U.S. stock market opens). When there is a federal holiday on Monday, Tuesday or Wednesday, release is on Thursday at 8:30 a.m. ET.

NFCI data are from the Federal Reserve Economic Data (FRED) maintained by the Federal Reserve Bank of St. Louis. S&P 500 Index data are from Yahoo!Finance.

The returns used in Figure 2-1d are Friday close to Friday close for the week following that measured by NFCI. The value of R2 for the relationship is 0.026, indicating that weekly change in NFCI explains 2.6% of S&P 500 Index returns during the calendar week after the NFCI measurement interval. However, these returns do not account for the delay in release from Friday (close of business) to the next Wednesday or Thursday morning, and a realistic trading strategy therefore cannot fully capture them.

How does accounting for release delay affect the relationship?

Figure 2-1d: Future Returns Not Accounting for Release Delay


Figure 2-1e relates weekly S&P 500 Index return to prior-week change in NFCI based on NFCI release dates. Specifically, the weekly change in NFCI is from one NFCI release date one to NFCI release date two. The associated S&P 500 Index return measurement is from the market open on NFCI release date two to the market open on NFCI release date three (this specification makes generating these returns cumbersome). A trading strategy could capture these returns. The value of R2 for the relationship is 0.022, indicating that weekly change in NFCI explains 2.2% of S&P 500 Index returns, about 15% weaker than the relationship in Figure 2-3a. The sample period is long, and this difference could be material in a full sample backtest, with the more precise accounting for the release delay performing worse.

Figure 2-1e Future Returns Accounting for Release Delay


2.2 Unbiased Data

There may be subtle disruption of data availability because of retroactive data series revisions due to changes in seasonal adjustment factors (for government economic variables) or periodic renormalization (for example, to maintain a target average for the entire series). Use of percentage changes in variables mitigates this concern, as does use of “as-released” or vintage data. Vintage data for some widely used series are available from Archival Federal Reserve Economic Data (ALFRED). If comprised of entire series, vintage datasets can be very large and difficult to manipulate (a different series for each release date). Revised data may contain more knowledge and be more suitable for forward-looking analysis (model training, or parameter setting) than vintage data, but it is risky to use revised data for backtesting. The data in a revised dataset is not what an investor experienced in real time during the sample period.

As a specific example, the NFCI series used in Figures 2-1d and 2-1e above incorporates weekly renormalization of the entire series to maintain a zero mean, so the “historical” values of the variable may change each week. The sign of NFCI for a given week may even change over time. Backtests with a current NFCI series therefore employ information not known during the backtest. Use of weekly changes in NFCI (unaffected by renormalization), rather than weekly levels, avoids this issue.

Data specifications may introduce bias. For example:

Suppose a long-term analysis of individual stocks limits consideration to those with a history of at least 20 years. Such a specification incorporates survivorship bias by excluding (many) companies that fail or merge or are acquired within 20 years. Without an accurate prediction beforehand about which companies will survive for at least 20 years, this data specification likely inflates expected returns.

Benchmarks determined from average hedge fund performance, as estimated from one of several databases to which hedge funds report voluntarily, may be substantially biased. If the database manager allows newly reporting hedge funds to initiate reporting with several years of data, this early data likely overstates expected performance (backfill bias), because only hedge funds with a good (lucky?) start are likely to initiate reporting. Also, hedge funds experiencing poor returns may stop reporting, depriving the database of some poor performers (“delisting” bias).

Disasters (such as war or hyperinflation) may render data so difficult to incorporate (unknowable) into a long-term series that data sources ignore them as outliers, thereby offering data series that ignore potential impacts of future disasters.

Use of index levels as tradable prices ignores the costs of constructing and maintaining a liquid fund to track the index, thereby overstating potential returns (more on this widespread practice in Chapter 4).

Dividend-adjusted stock prices implicitly assume that investors receiving dividends immediately reinvest them into the stocks delivering them with no investment frictions. However, actual reinvestment of dividends by many investors would affect stock prices and reinvestment of dividends generally bears transaction fees and bid-ask spreads.

2.3 Linear or Non-linear Relationship?

Many studies employ linear regressions to quantify the relationship between an independent variable and future returns (as in Figures 2-1b through 2-1e above). However, many economic and financial data series exhibit non-linear tails, meaning that extreme values deviate materially from a best-fit line. A common approach for detecting tail effects is to: (1) order the data set based on the independent variable; (2) divide it into five (quintile) or ten (decile) subsamples of equal size; and, (3) calculate simple statistics for all the subsamples.

Figure 2-3 illustrates this approach using the data set from Figure 2-1e. The steps are: (1) order the 2,121 weekly data from lowest to highest change in NFCI; (2) divide the sample into 10 deciles of about 212 observations each; and, (3) calculate the average value of the change in NFCI last week and the release-to-next release S&P 500 Index return for each decile. Results indicate that both NFCI and associated future returns have pronounced tails (deciles 1 and 10), suggesting strategies for exploiting the relationship.

Figure 2-3: Tail Effects in Independent Variable and Returns


2.4 Benchmarking

Strategy benchmarking approaches vary considerably across studies. Relevant questions are:

  1. What would the investor do with the capital if not allocated to the strategy being tested?
  2. What simple passive benchmark is most like the strategy?
  3. What benchmark most effectively isolates the key alpha-generating hypothesis of the strategy?

Focusing on the first question, many studies use the S&P 500 Index or its tracking exchange-traded fund (ETF) SPDR S&P 500 (SPY) as a simple, widely used equity investment that approximates the capitalization-weighted U.S. stock market. The commonly quoted S&P 500 Index does not include dividends, but the readily available dividend-adjusted SPY does. However, as noted, above, dividend-adjusted price series assume frictionless and instantaneous dividend reinvestment.

A capitalization-weighted benchmark may be misleadingly conservative. Suppose a stock-picking strategy gives equal weight to each of several U.S. stock holdings, with equal weighting periodically restored via rebalancing. To avoid conflation of strategy performance with a simple size effect, an equally weighted benchmark, such as Guggenheim S&P 500 Equal Weight ETF (RSP), is appropriate. More complex stock portfolio benchmarks may strip out factors other than size. For example, in assessing the stocks selected by a strategy, a benchmark may correct for:

  • The market betas of selected stocks, which measure the degrees to which they tend to move with or against the overall (capitalization-weighted) stock market.
  • The degree to which selected stocks orient toward value or growth (represented by book value-to-market capitalization ratio).
  • Price momentum of selected stocks, measured by their returns over the past year.
  • Liquidity of selected stocks, as an indication of how costly trading them might be.

The usual correction method is linear regression, with any outperformance that survives correction (is unexplained by the correction factors) called alpha.

Suppose a somewhat complex strategy seeks to time the overall U.S. stock market to avoid bear markets. Two reasonable benchmarks are:

  1. To ensure the complex strategy is worth the effort, an easily implemented competitor that holds a broad stock market fund (cash) when the fund is above (below) its 10-month simple moving average.
  2. To ensure the complex strategy outperforms luck, a “strategy” that enters and exits the a broad stock market fund based on periodic random signals.

Figure 2-4 illustrates the performance of four simple potential benchmarks for a U.S. stock portfolio as discussed above:

  1. The S&P 500 Index is widely reported but excludes dividends, which contribute materially to long-term portfolios.
  2. Dividend-adjusted SPY is an ETF that tracks the S&P 500 Index and includes dividends with periodic reinvestment (but does not account for any costs of dividend reinvestment).
  3. Dividend-adjusted RSP an equally weighted version of capitalization-weighted SPY, arguably appropriate as a benchmark for strategies that weight U.S. stock positions equally (but again does not account for any costs of dividend reinvestment).
  4. SMA10-SPY is a simple U.S. stock market timing strategy that holds SPY (cash) when the prior-month S&P 500 Index is above (below) its 10-month simple moving average. Calculations approximate the return on cash as the yield on 3-month U.S. Treasury bills, but do not account for any costs of (infrequent) trading or dividend reinvestment. If a complex U.S. stock market timing strategy cannot beat this one, it may not be worth pursuing.

Data for S&P 500 Index, SPY, RSP and the 3-month U.S. Treasury bill yield are from Yahoo!Finance.

In general, conservative and easily implementable benchmarks help avoid wasting time on development of marginal strategies.

Figure 2-4: Simple U.S. Equity Benchmarks


2.5 Untradeable Spot Prices

Some studies employ spot market prices to estimate returns to trading strategies practically implementable only via derivatives (such as options or futures). Spot markets and options/futures markets (in which participants are trying to predict spot price behaviors) may behave differently, so modeling options/futures strategies based on spot prices may mislead. To the extent possible, investors should model and test with price series of assets to be included in a real portfolio.

Figure 2-5 illustrates the difference in daily series behavior between the spot VIX and exchange-traded notes (ETN) designed to track the total return of rolling positions in VIX short-term futures, the iPath S&P 500 VIX ST Futures ETN (VXX).  Specifically, the figure depicts:

  • The correlation between daily change in VIX and next-day change in VIX both before and after the introduction of VXX in early 2009.
  • The correlation between daily VXX return and next-day VXX return since the introduction of VXX.

Data for VIX and VXX are from Yahoo!Finance.

Both before and since introduction of VXX, VIX exhibits some degree of daily reversion (negative correlation between daily and next-day returns). However, VXX does not exhibit such reversion (correlation 0.00). In other words, VIX futures behave differently from VIX. Investors seeking to trade volatility should analyze the behaviors of associated derivatives directly and not depend on VIX behaviors for signals.

Figure 2-5: VIX vs. VXX reversion


2.6 Accounting for Cash in the Portfolio

Many strategies make allocations to cash part or all of the time. The conventional approach for long-term analyses is to use the “risk-free rate” as proxied by 1-month or 3-month U.S. Treasury bill yields as the return on cash. Figure 2-6 shows the 3-month U.S. Treasury bill yield since 1934. Data are from the Federal Reserve Bank of St. Louis’ FRED. There are times when this yield is substantial compared to the returns on other assets. During the last few years, the yield is near zero.

Accounting for return on cash, especially for backtests including the 1970s and 1980s, may be decisive in determining whether a market timing strategy is attractive or not.

Use of longer term government notes or bonds as the risk-free rate can mislead, because liquidation values of these instruments vary as interest rates change depending on their duration. In other words, these instruments are not risk-free.

Figure 2-6: 3-month U.S. Treasury bill yield


2.7 Shorting Costs/Feasibility

Many strategies include short positions. For example, a common approach to measuring effectiveness of a stock screening strategy is to establish long positions in “good” stocks and short positions in “bad” stocks. However, shorting is generally involves stock borrowing costs, is constrained (involves margin requirements and potential margin calls) and is sometimes not feasible (no shares to borrow).

Investors backtesting strategies that involve shorting should account for:

  • Broker fees/asset borrowing charges and ongoing interest/dividends on borrowed securities.
  • Any associated account constraints, including reserves for margin calls.
  • Whether the types of assets being tested are typically available for shorting (borrowing).

Impacts of these concerns vary over time. Restricting the universe to highly liquid assets mitigates the third concern. Substituting “short” ETFs or ETNs may sometimes be a realistic alternative.

2.8 Confounding Long-Term Trends

Some studies do not account for long-term trends in potentially interacting variables that may make even long samples unrepresentative of future data. For example:

Disinflation since the 1980s has elevated bond returns and is arguably not applicable to the next 30 years. Any analysis which seeks to predict bond returns based on such historical data may mislead.

Substitution of stock buybacks for dividends starting in 1980s makes current dividend levels relatively low compared to many pre-1980 levels. This trend may disrupt analyses that use stock dividend yields as predictors.

Differences in computational power and trading frictions from one era to another may affect the operation of, and opportunities within, markets. This effect may be pervasively disruptive for long-term investment analysis.

Falling trading frictions over the past two decades make previously unexploitable anomalies exploitable, but also may change market behaviors. It seems reasonable to assume that falling frictions speed up market adaptation to anomalies.

Proliferation of equity ETFs may be promoting stock co-movement as investors increasingly buy in bulk rather than stock-by-stock. This trend may disrupt portfolio allocation analyses that depend on correlations of returns across stocks.

Financialization of previously “alternative” assets (commodities, real estate) via futures and exchange-trade products may affect trading, thereby (for example) elevating correlations across asset classes and (more generally) invalidating inferences from old data. Such a trend could depress the value of fundamental analysis and elevate the importance of macro factors. Financialization of volatility via exchange-traded products may affect market volatility.  These effects are especially disruptive for studies of asset class diversification and volatility hedging.

Such potentially confounding effects argue for leaning toward good enough, rather than statistically rigorous (and time-consuming), strategy testing.

Neglect of trading frictions is widespread in investment research, but frictions can kill strategies that are attractive on a gross level. This issue is so important that it merits an entire chapter (Chapter 4).

2.9 Summary

Key messages from this chapter are:

  • Sound modeling and testing of investment strategies requires an understanding of what inputs and output are required and how the inputs may be biased or confounded.
  • Sound modeling and testing of investment strategies requires an understanding of how the inputs and output fit together in logical sequence.
  • Charlatans who want to sell some “more-art-than-science” advice should find an indicator with a strong coincident relationship with asset price (or index level), plot the two series on a graph to show how well they track over time and then market to investors who did not read this chapter.

Next, Chapter 3 addresses snooping bias, the distortion of findings from a sometimes subtle interplay of logic and statistics.

Daily Email Updates
Research Categories
Recent Research
Popular Posts