Evidence-based investing research
Value Investing Strategy (Strategy Overview)
Allocations for March 2026 (Final)
Cash TLT LQD SPY
Momentum Investing Strategy (Strategy Overview)
Allocations for March 2026 (Final)
1st ETF 2nd ETF 3rd ETF

When AIs Generate Their Own Training Data

Steve LeCompte | | Posted in: Big Ideas, Investing Expertise

What happens as more and more web-scraped training data for Large Language Models (LLM), such as ChatGPT, derives from outputs of predecessor LLMs? In their May 2023 paper entitled "The Curse of Recursion: Training on Generated Data Makes Models Forget", Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot and Ross Anderson investigate changes in LLM outputs as training data becomes increasingly LLM-generated. Based on simulations of this potential trend, they find that:

Subscribe to Keep Reading

Get the research edge serious investors rely on.

  • 1,200+ research articles
  • Monthly strategy signals
  • 20+ years of backtested analysis
$17.99 /month

Cancel anytime