Can analysts, experts and gurus really give you an investing/trading edge? Should you track the advice of as many as possible? Are there ways to tell good ones from bad ones? Recent research indicates that the average “expert” has little to offer individual investors/traders. Finding exceptional advisers is no easier than identifying outperforming stocks. Indiscriminately seeking the output of as many experts as possible is a waste of time. Learning what makes a good expert accurate is worthwhile.
Are prediction markets better at forecasting firm earnings than professional analysts? In their April 2026 paper entitled “Beating the Earnings Game: Why Do Prediction Markets Outperform Professional Analysts?”, Daniel Rabetti, Jiaqi Shao and Che Zhang investigate whether and, if so, why a blockchain-based prediction market such as Polymarket outperforms professional analysts in forecasting U.S. stock earnings. The earnings predictions of this market are public and unchangeable contracts, taking the form:
“Will [Company] beat earnings for [Quarter] [Fiscal Year]?”
relative to analyst consensus as of contract creation date. Using data for 469 Polymarket firm-quarter earnings beat contracts, corresponding analyst earnings forecast data and associated daily stock prices during September 2025 through February 2026, they find that:
Does the empirical accuracy of prediction markets derive from crowd wisdom or an informed few? In their April 2026 paper entitled “Prediction Market Accuracy: Crowd Wisdom or Informed Minority?”, Roberto Cram, Yunhan Guo, Theis Jensen and Howard Kung investigate why prediction markets exhibit accuracy. Specifically, they compare the distribution of actual trade directions with a hypothetical distribution of random trades, and thereby classify traders as:
Market makers, who provide liquidity by posting limit orders.
Skilled traders, winners whose gains cannot be attributed to chance.
Other winners and other losers, who respectively earn positive and negative returns but whose performance is not statistically significant.
Persistent losers, who consistently and significantly lose.
Using the Polymarket universe of transactions and accounts with at least 10 trades across propositions created after the beginning of January 2023 and resolved by the end of December 2025, they find that:
Portfolio construction agents each independently employ Step 2 and 3 outputs to proposed a portfolio based on an assigned method (such as equal weight, inverse volatility, mean-variance optimization or risk parity), including:
A researcher agent to propose novel portfolio construction methods.
An adversarial agent to uncover unconventional allocation ideas.
Multiple agents review all proposed portfolios and vote on them.
A chief investment officer agent scores, selects and combines surviving proposed portfolios using an ensemble of seven combination methods. This agent then summarizes a final recommendation/reasoning/dissenting views.
They include a meta-agent that compares forecasted and realized returns and rewrites agent scripts to improve future performance. They specify each agent in this system via a description, a set of scripts, a collection of skills and a structured output. An Investment Policy Statement (specifying asset class universe, objective, tracking error) constrains the AI agents. Overall, this system compresses days or weeks of human work into minutes. Based on prior research and experience with LLM-based AI agents, they observe that:Keep Reading
When assigned to perform the same empirical financial research, do the findings of human researchers and large language models (LLM) as a kind of artificial intelligence (AI) differ? If so, why? In their March 2026 paper entitled “AI ‘Errors'”, Wenqian Huang, Albert Menkveld and Shihao Yu compare outcomes for 158 AI model iterations (agents) to those from prior research for 164 independent human teams employing the same sample of 720 million equity index futures trades to test the same six hypotheses. They choose the GPT-5.2 LLM to construct AI agents, with variability in outcomes driven by its probabilistic decision-making. They further examine which types of research decisions drive any differences. Using outcomes from the AI and human researcher test runs, they find that:Keep Reading
Is growing investor/trader use of large language models (LLM) extinguishing known stock return anomalies? In their March 2026 paper entitled “Do LLMs Make Markets More Efficient?”, Runjing Lu, Yongxin Xu and Luka Vulicevic examine how use of LLMs is affecting reactions of individual stocks to recent newsworthy events with and without outages of LLMs from three major providers (ChatGPT, Claude and Gemini). Together, these three account for nearly 80% of LLM usage. They classify outages as (1) any, (2) single-provider severe or (3) multi-provider, as documented by each provider. They focus on outages that coincide with news releases and persist beyond the NYSE close. They use RavenPack Event Sentiment Scores for articles from the Dow Jones Newswire that have ticker-specific relevance scores above 75. They control for time-varying stock/firm characteristics, past returns, new type and calendar effects. They measure daily abnormal stock returns relative to those of a characteristic-matched benchmark portfolio. Using daily outage, stock/firm and news/sentiment data during March 2023 through November 2025, they find that:Keep Reading
Are large language models (LLM) robust financial advisors for individuals? In their March 2026 paper entitled “AI Financial Advice: Supply, Demand, and Life Cycle Implications”, Taha Choukhmane, Tim de Silva, Weidong Lin and Matthew Akuzawa examine the personal financial advice from LLMs. They mainly use GPT-5.2 but repeat analyses using Gemini 3 Flash as a robustness check. Specifically, they:
Construct a life cycle model of income/spending/saving/investment, with labor market shocks and asset returns calibrated to U.S. data.
Collect questions (prompts) from a demographically representative sample of about 1,000 U.S. adults about spending and investing, including summaries of respective financial situations.
Simulate life cycle paths of individuals for each year from ages 22 to 90 who follow two-pass advice in LLM responses to prompts from survey participants matched by age, income and employment status. The first pass solicits textual advice, and the second translates text to quantified saving, spending and asset allocation recommendations.
They consider two benchmarks: (1) the optimal behaviors for the life cycle model simulations; and, (2) substitution of survey respondent prompts with expert (academic) prompts that ask the LLM to give professional life cycle advice under modern portfolio theory, including explicit personal situations/economic assumptions. Using the specified life cycle model and LLM prompts, they find that:Keep Reading
The link between prediction accuracy and profitability.
Characteristics of profitable and unprofitable trading.
Using the complete Polymarket transaction history (about 70 million trades by 1.4 million users) during November 2022 through October 2025, they find that:Keep Reading
Can autonomous artificial intelligence (Agentic AI), which interprets market dynamics with continuous improvement and specifies resulting trades with minimal human intervention, run an attractive portfolio? In their March 2026 paper entitled “Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI”, Allen Huang and Zheqi Fan employ Agentic AI as a self-directed quantitative researcher that translates a high-level objective, such as maximizing risk-adjusted returns while controlling for turnover/transaction costs, into buy/sell/hold decisions. Specifically, their Agentic AI model each day:
Backtests the candidate factors and assesses their economic rationale.
Derives stock buy/sell/hold decisions from surviving candidates.
Updates its memory based on empirical feedback.
The Agentic AI model mitigates data snooping bias by requiring economic rationale, adjusting for multiple hypothesis testing and evaluating out-of-sample signal decay. They use historical data through December 2020 to train the model and 2021-2024 data for out-of-sample testing. They assume 0.03% trading frictions (commission plus bid-ask spread) to assess net performance. Using daily data for a broad sample of U.S. common stocks priced at least $5 and excluding extreme outliers during January 2004 through December 2024, they find that:
Does Michael Farr, CEO and founder of Farr, Miller & Washington, offer good stock picks via his annual CNBC articles identifying the best 10 stocks for the next year? To investigate, we take his picks for 2022, 2023, 2024 and 2025, calculate the associated annual total returns for each stock and compute the equal-weighted average return for the 10 stocks for each year. We use SPDR S&P 500 ETF Trust (SPY) as a benchmark for these averages. Using year-end dividend-adjusted stock prices for the specified stocks-years, we find that:Keep Reading
Should users of artificial intelligence (AI), as implemented via Large Language Models (LLM) with latitude to operate independently, expect good treatment? In their February 2026 paper entitled “Agents of Chaos”, a large research team reports results from two weeks of intensive, realistic interactions between 20 researchers and largely autonomous LLMs. Autonomy means that the LLM has system administrator rights to its own server/storage and access to dedicated Discord and email accounts for interactions with its owner (a human) and non-owners (human and LLM). The principal goal of the 20 interacting researchers was to break (induce problematic behaviors from) the autonomous LLMs. Much of the paper is in case study format. Based on outputs of the two weeks of interactions, they conclude that:Keep Reading
Become a CXO Member
Gain access to hundreds of premium investing research articles and CXO's trading strategies