Evidence-based investing research
Value Investing Strategy (Strategy Overview)
Allocations for March 2026 (Final)
Cash TLT LQD SPY
Momentum Investing Strategy (Strategy Overview)
Allocations for March 2026 (Final)
1st ETF 2nd ETF 3rd ETF

Lookahead Bias in Large Language Model Training Data

Steve LeCompte | | Posted in: Investing Expertise

Can Large Language Models (LLM) inject lookahead bias into backtests when rigor is lacking in generation of LLM training samples? In their preliminary and incomplete March 2024 paper entitled "Lookahead Bias in Pretrained Language Models", Suproteem Sarkar and Keyon Vafa examine the potential for lookahead bias in backtests using the Llama-2 LLM to identify future firm risks based on content of earnings calls. They consider cases for which: (1) the backtest falls within the LLM training sample, but the researcher tells the LLM to consider only information before the test period; and, (2) the researcher specifies a training sample that ends before the backtest but generates it long after the end of the training sample. Using Llama-2 to interpret transcripts of selected firm earnings calls from 2018, they find that:

Subscribe to Keep Reading

Get the research edge serious investors rely on.

  • 1,200+ research articles
  • Monthly strategy signals
  • 20+ years of backtested analysis
$17.99 /month

Cancel anytime