What happens as more and more web-scraped training data for Large Language Models (LLM), such as ChatGPT, derives from outputs of predecessor LLMs? In their May 2023 paper entitled "The Curse of Recursion: Training on Generated Data Makes Models Forget", Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot and Ross Anderson investigate changes in LLM outputs as training data becomes increasingly LLM-generated. Based on simulations of this potential trend, they find that:
Subscribe to Keep Reading
Get the research edge serious investors rely on.
- 1,200+ research articles
- Monthly strategy signals
- 20+ years of backtested analysis
$17.99
/month
Cancel anytime