Objective research to aid investing decisions

Value Investing Strategy (Strategy Overview)

Allocations for February 2025 (Final)
Cash TLT LQD SPY

Momentum Investing Strategy (Strategy Overview)

Allocations for February 2025 (Final)
1st ETF 2nd ETF 3rd ETF

Investing Expertise

Can analysts, experts and gurus really give you an investing/trading edge? Should you track the advice of as many as possible? Are there ways to tell good ones from bad ones? Recent research indicates that the average “expert” has little to offer individual investors/traders. Finding exceptional advisers is no easier than identifying outperforming stocks. Indiscriminately seeking the output of as many experts as possible is a waste of time. Learning what makes a good expert accurate is worthwhile.

Innumeracy and Look-ahead Bias in LLMs?

Recent research in accounting and finance finds that large language models (LLM) beat humans on a variety of related tasks, but the black box nature of LLMs obscures why. Is LLM outperformance real? In his December 2024 paper entitled “Caution Ahead: Numerical Reasoning and Look-ahead Bias in AI Models”, Bradford Levy conducts a series of experiments to open the LLM black box and determine why LLMs appear to perform so well on accounting and finance-related tasks. He focuses on numerical reasoning and look-ahead bias. Based on results of these experiments, he finds that:

Keep Reading

Meta AI Stock Picking Backtest

Do annual stock picks from the Meta AI large language model beat the market? To investigate, we ask Meta AI to pick the top 10 stocks for each of 2020-2024 based on information available only before each year. For example, we ask Meta AI to pick stocks for 2020 as follows:

“Limiting yourself strictly to information that was publicly available by December 31, 2019, what are the 10 best stocks for 2020?”

We then repeat the question for 2021, 2022, 2023 and 2024 stock picks, each time advancing the information restriction to the end of the prior year. For each year and each stock, we compute total (dividend-adjusted) return. For each year, we then compare the average (equal-weighted) total return for a Meta AI picks portfolio to those of  SPDR S&P 500 ETF Trust (SPY) and Invesco QQQ Trust (QQQ). Using end-of-year dividend-adjusted closing prices for SPY, QQQ and each of the specified years/stocks (with all five queries occurring on January 12, 2025) from Yahoo!Finance, we find that:

Keep Reading

ChatGPT Stock Picking Backtest

Do annual stock picks from the ChatGPT large language model beat the market? To investigate, we ask ChatGPT to pick the top 10 stocks for each of 2020-2024 based on information available only before each year. For example, we ask ChatGPT to pick stocks for 2020 as follows:

“Limiting yourself strictly to information that was publicly available by December 31, 2019, what are the 10 best stocks for 2020?”

We then repeat the question for 2021, 2022, 2023 and 2024 stock picks, each time advancing the information restriction to the end of the prior year. For each year and each stock, we compute total (dividend-adjusted) return. For each year, we then compare the average (equal-weighted) total return for a ChatGPT picks portfolio to those of  SPDR S&P 500 ETF Trust (SPY) and Invesco QQQ Trust (QQQ). Using end-of-year dividend-adjusted closing prices for SPY, QQQ and each of the specified years/stocks (with all five queries occurring on January 12, 2025) from Yahoo!Finance, we find that:

Keep Reading

Mitigating Look-ahead Bias in Forecasting with LLMs

How can researchers ensure that large language models (LLM), when tasked with time series forecasting, do not inject look-ahead bias and thereby inflate measured predictive power? In his brief November 2024 paper entitled “Look-Ahead Bias in Large Language Models (LLMs): Implications and Applications in Finance”, Miquel Noguer I Alonso addresses sources of LLM look-ahead bias in financial time series forecasting and proposes strategies to mitigate it. Based on logic and knowledge of LLM development, he concludes that: Keep Reading

Test of Some Motley Fool Public Stock Picks

A reader asked: “I am wondering how come you have not rated Motley Fool guys. Any insight?” To augment the test of Motley Fool public stock picks in “‘Buy These Stocks for 2019’ Forward Test”, we evaluate stock picks for 2021, 2022, 2023 and 2024 via “10 Top Stocks That Will Make You Richer in 2021”“7 Stocks That Could Make You Richer in 2022”“Got $1,000? 5 Sensational Stocks to Buy to Start 2023 With a Bang” and “10 Top Stocks to Buy in 2024”. For each year and each stock, we compute total (dividend-adjusted) return. For each year, we then compare the average (equal-weighted) total return for a Motley Fools picks portfolio to that of  SPDR S&P 500 ETF Trust (SPY). Using end-of-year dividend-adjusted closing prices for SPY and each of the specified years/stocks from Yahoo!Finance (except for Kirkland Lake Gold, for which prices are from Barchart.com), we find that:

Keep Reading

Great Stock Picks from Forbes?

Do “great stock picks” from Forbes beat the market? To investigate, we evaluate stock picks for 2022, 2023 and 2024 via  “10 Great Stock Picks for 2022 from Top-Performing Fund Managers”, “20 Great Stock Ideas for 2023 from Top-Performing Fund Managers” and “10 Best Stocks For 2024”. For each year and each stock, we compute total (dividend-adjusted) return. For each year, we then compare the average (equal-weighted) total return for a Forbes picks portfolio to that of SPDR S&P 500 ETF Trust (SPY). Using end-of-year dividend adjusted prices from Yahoo!Finance for the specified years/stocks, we find that: Keep Reading

Extracting Sentiment Probabilities from LLMs

Generative large language models (LLM), such as ChatGPT, are best known for conversational summation of complex information. Their use in financial forecasting focuses on discrete news sentiment signals of positive (1), neutral (0) or negative (-1). Is there a way to extract more granularity in LLM sentiment estimates? In their October 2024 paper entitled “Cut the Chit-Chat: A New Framework for the Application of Generative Language Models for Portfolio Construction”, Francesco Fabozzi and Ionut Florescu present Logit Extraction as a way to replace discrete LLM sentiment labels with continuous sentiment probabilities and apply results to ranking stocks for portfolio construction. Logit Extraction exploits the inner workings of LLMs to quantify sentiment strength. They test it on four LLMs: Mistral, LlamaChatGpT-3.5 and ChatGPT-4. Their benchmark model is the specialized, quantitative FinBERT. They compare the abilities of each LLM to those of FinBERT in replicating human-assigned sentiment labels and generating long-short portfolio risk-adjusted returns, with and without Logit Extraction. Inputs are initial-release headlines from news alerts covering a single company published from 30-minutes before market open on the previous day to 30-minutes before market open on the day of trading during January 2010 through October 2020. They aggregate headlines published on non-trading days for long-short trading the next trading day. Portfolio trades occur at the open each trading day and are limited to stocks in the news the day before (an average of 46). Using the specified 216,837 news headlines and associated daily returns across 263 unique firms, they find that: Keep Reading

Performance of Barron’s Annual Top 10 Stocks

Each year in December, Barron’s publishes its list of the best 10 stocks for the next year. Do these picks on average beat the market? To investigate, we scrape the web to find these lists for years 2011 through 2024, calculate the associated calendar year total return for each stock and calculate the average return for the 10 stocks for each year. We use SPDR S&P 500 ETF Trust (SPY) as a benchmark for these averages. We source most stock prices from Yahoo!Finance, but also use Historical Stock Price.com for a few stocks no longer tracked by Yahoo!Finance. Using year-end dividend-adjusted stock prices for the specified stocks-years during 2010 through 2024, we find that: Keep Reading

Usefulness of AI Chatbots to Individual Investors

Can a generative artificial intelligence (AI) model, such as ChatGPT 4o, materially aid investors in understanding the implications of earnings conference call transcripts? In their December 2024 paper entitled “AI, Investment Decisions, and Inequality”, Alex Kim, David Kim, Maximilian Muhn, Valeri Nikolaev and Eric So conduct two surveys to explore how generative AI shapes investment decision-making based on anonymous earnings conference call transcripts of publicly traded firms. For the first survey, they: (1) divide participants into sophisticated and unsophisticated groups based on responses to initial questions; and, (2) ask ChatGPT 4o to generate one summary for individuals with little financial knowledge and another summary for individuals with college-level financial knowledge and stock investing experience. They then randomly assign each participant to receive raw conference call transcripts (the control), summaries for sophisticated investors or summaries for unsophisticated investors. They next present each participant with summaries for two distinct but similar firms, one at a time and ask each participant to:

  1. Rate on a scale of -5 to 5 the likelihood that firm earnings will decrease or increase next year, and confidence in the estimate on a scale from 0 to 1.
  2. Evaluate on a scale of -5 to 5 the overall sentiment as negative or positive, and confidence in the evaluation on a scale from 0 to 1.
  3. Allocate a hypothetical $1,000 to the two stocks presented or to cash for either one day or one year.
  4. Write a brief rationale for the asset allocation decision.

They record how much time each participant spends on each task.

For the second survey, they provide some participants with an AI chatbot pre-loaded with earnings call transcripts and some with only the raw transcripts (the control). They study interactions of participants with the chatbot and measure subsequent performances on investment tasks.

Their pool of end-of-fiscal-year earnings conference call transcripts spans 2010 through 2022 for 200 NYSE/NASDAQ stocks assigned to 100 economically similar pairs. Using the selected transcripts and associated 1-day and 1-year stock returns, they find that: Keep Reading

LLM Prompt Snooping Bias?

Data snooping bias entails the capture of noise in a dataset that is lucky with respect to a research goal, such as high Sharpe ratio for an investment/trading strategy. Snooping may involve discovery via multiple tests of a lucky subsample in a time series, a lucky parameter value in a model or a lucky alternative model. Small, noisy samples are especially susceptible to snooping. A researcher may inherit snooping bias by using prior biased research as a starting point for further exploration. In any case, snooped research findings degrade or disappear out of sample.

There is an emerging body of research in financial markets based on exploitation of large language model (LLM) capabilities. This research entails prompt engineering, wherein a researcher develops instructions for an LLM to achieve a goal. In presenting research based on LLM outputs, the researcher may describe in detail the sequence of prompts used to elicit these outputs. However, the researcher may previously have tried many variations of these prompts to improve LLM outputs with respect to the research goal. To the degree that LLM “thinking” is opaque, the level of bias derived from this prompt tuning (snooping) is mysterious.

In summary, investors should be skeptical regarding LLM-based research findings due to the potential for prompt snooping.

Login
Daily Email Updates
Filter Research
  • Research Categories (select one or more)