When AIs Generate Their Own Training Data

Steve LeCompte | June 26, 2023 | Posted in: Big Ideas, Investing Expertise

What happens as more and more web-scraped training data for Large Language Models (LLM), such as ChatGPT, derives from outputs of predecessor LLMs? In their May 2023 paper entitled "The Curse of Recursion: Training on Generated Data Makes Models Forget", Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot and Ross Anderson investigate changes in LLM outputs as training data becomes increasingly LLM-generated. Based on simulations of this potential trend, they find that:

Get the research edge serious investors rely on.

1,200+ research articles
Monthly strategy signals
20+ years of backtested analysis

$17.99 /month

Cancel anytime

Subscribe Now Already a member? Log in

Subscribe to Keep Reading

Further Reading