When assigned to perform the same empirical financial research, do the findings of human researchers and large language models (LLM) as a kind of artificial intelligence (AI) differ? If so, why? In their March 2026 paper entitled "AI 'Errors'", Wenqian Huang, Albert Menkveld and Shihao Yu compare outcomes for 158 AI model iterations (agents) to those from prior research for 164 independent human teams employing the same sample of 720 million equity index futures trades to test the same six hypotheses. They choose the GPT-5.2 LLM to construct AI agents, with variability in outcomes driven by its probabilistic decision-making. They further examine which types of research decisions drive any differences. Using outcomes from the AI and human researcher test runs, they find that:
Subscribe to Keep Reading
Get the research edge serious investors rely on.
- 1,200+ research articles
- Monthly strategy signals
- 20+ years of backtested analysis
Cancel anytime