Objective research to aid investing decisions

Value Investing Strategy (Strategy Overview)

Allocations for May 2024 (Final)

Momentum Investing Strategy (Strategy Overview)

Allocations for May 2024 (Final)
1st ETF 2nd ETF 3rd ETF

Lookahead Bias in Large Language Model Training Data

April 26, 2024 • Posted in Investing Expertise

Can Large Language Models (LLM) inject lookahead bias into backtests when rigor is lacking in generation of LLM training samples? In their preliminary and incomplete March 2024 paper entitled “Lookahead Bias in Pretrained Language Models”, Suproteem Sarkar and Keyon Vafa examine the potential for lookahead bias in backtests using theĀ Llama-2 LLM to identify future firm risks based on content of earnings calls. They consider cases for which: (1) the backtest falls within the LLM training sample, but the researcher tells the LLM to consider only information before the test period; and, (2) the researcher specifies a training sample that ends before the backtest but generates it long after the end of the training sample. Using Llama-2 to interpret transcripts of selected firm earnings calls from 2018, they find that:


Please or subscribe to continue reading...
Gain access to hundreds of premium articles, our momentum strategy, full RSS feeds, and more!  Learn more

Daily Email Updates