What is the best approach for measuring extreme loss risk? In their April 2015 paper entitled “Why Risk Is So Hard to Measure”, Jon Danielsson and Chen Zhou analyze the robustness of standard extreme loss risk analysis methods. They focus on:

- The difference in the reliabilities of forecasts based on Value-at-Risk (VaR) and expected shortfall (ES).
- The reliabilities of these forecasts as sample size decreases.
- The difference in reliabilities of forecasts based on time scaling of high-frequency data (say, daily) versus overlapping high-frequency data to forecast risk over a many-day holding period.

In a nutshell, VaR assesses the probability that a portfolio loses at least a specified amount over a specified holding period, and ES is the expected portfolio return for a specified percentage of the worst losses during a specified holding period. The theoretically soundest sampling approach is to use non-overlapping past holding-period returns, but this approach usually means very small samples. Time scaling uses past high-frequency data once and scales findings to the longer holding period by multiplying by the square root of the holding period. Overlapping data re-uses past high-frequency data many times, thereby creating observations that are clearly not independent. Based on theoretical analysis and intensive Monte Carlo simulation derived from daily returns for a broad sample of liquid U.S. stocks during 1926 through 2014, *they conclude that:*

- While ES is theoretically superior to VaR, it has a higher estimation error. For investors using samples up to a few thousand observations, VaR offers more accurate forecasts than ES.
- ES and VaR forecast reliabilities are very sensitive to sample size. For samples of 500 or fewer observations, the forecasts offer very little information.
- For long holding periods, risk forecasts based on time scaling are more reliable than those based on overlapping data.

In summary, *analysis and evidence suggest using VaR rather than ES, and time scaling rather than overlapping of historical data (when the holding period is long compared to the measurement frequency).*

Many individual investors use maximum drawdown rather than VaR or ES to assess extreme loss risk. This metric is somewhat like VaR and ES but allows a variable holding period and looks only at the worst loss rather than a set of extreme losses below some threshold. It shares the small-sample weaknesses of VaR and ES. For example, an investor using maximum drawdown to assess extreme loss risk of different strategies in the U.S. stock market would get very different results from 2005-2009 and 2010-2014 five-year samples. Moreover, even with combined samples, the strategy with the most benign maximum drawdown would be that best tuned to avoid losses during the 2008-2009 market crash. This “best” strategy may not be well-tuned for any other market crash.

Note that simulation results can mislead when the data sampling assumptions fail to incorporate real-world data features (such as path dependence due to autocorrelation of returns).