Unique title: News to Numbers: How ChatGPT Helps Sharpen Momentum Investing Without Breaking the Bank

Momentum investing targets stocks with recent price strength and trims exposure to laggards. This post explains how a ChatGPT like large language model can read firm level news in real time, generate signals that support both stock selection and portfolio weights, and potentially lift risk adjusted returns. The approach blends a traditional momentum signal with LLM guided interpretation of news, showing robustness to costs and constraints and strongest gains in concentrated portfolios.

Unique title: News to Numbers: How ChatGPT Helps Sharpen Momentum Investing Without Breaking the Bank

Introduction: A fresh lens on momentum investing

Momentum investing is one of those ideas that feels simple in theory and tricky in practice. Buy stocks that have risen in the past, sell or avoid those that haven’t, and you’re riding a familiar price trend. But markets don’t just run on yesterday’s numbers—they’re crowded with news, conversations, and headlines that slowly seep into prices. The big question researchers have chased for years: can we do a better job of interpreting all that news, in real time, to improve risk-adjusted returns?

This paper tackles that question head-on by bringing a high-tech helper into the mix: a large language model (LLM) — think something like ChatGPT — that reads firm-specific news and helps decide not just what to buy, but how much to tilt your bets. The idea is elegant in its simplicity: pair a traditional momentum signal with an LLM-driven interpretation of news to refine both stock selection and portfolio weights. The result? A momentum strategy that, in tests, shone brighter on the risk-adjusted metrics that matter to investors.

If you’re curious about the practical side of AI in investing, this study offers a clear, grounded look at how a pre-trained model (no heavy fine-tuning required) can act as a real-time news interpreter and generate actionable signals for a traditional factor strategy.

The core idea in plain terms

  • Start with a classic momentum approach: look back at how stocks have performed over the past year, pick the leaders, and form a long-only portfolio (in this study, the top two deciles of performers). This step is standard and acts as the baseline.
  • Bring in real-time news: for each candidate stock, collect firm-specific news articles with precise timestamps. The model reads these items and answers a simple question: does the recent news support a continuation of the past momentum, or not?
  • Use a score to guide decisions: the LLM outputs a score between -1 and 1 indicating the strength of the signal. This score then influences both which stocks are chosen and how much weight each stock gets in the portfolio.
  • Compare two flavors of prompts: a Basic prompt (lean, simple) and an Advanced prompt (more structured, with extra guidance). The prompts are crafted so the model knows the stock is about to enter a momentum-based portfolio, nudging it to assess the news in light of future returns.

In short: the paper tests whether a chat-based AI can read the news as it happens and, when plugged into a momentum framework, add extra oomph to risk-adjusted returns.

Data and setup in everyday terms

  • Universe and baseline: They focus on stocks in the S&P 500 to keep liquidity realistic. The baseline momentum signal uses a 12-month look-back (excluding the most recent month) and buys the top two deciles.
  • The LLM signal: For each stock in the momentum candidate set, they pull news from the Stock News API with minute-level timestamps. The model is prompted to judge whether recent news makes it more likely the stock will continue its recent climb.
  • News window and horizon: The look-back window for news is a key hyperparameter (kk). The study finds the sweet spot around kk = 1 day for the news window, with a forecast horizon of about 21 days (consistent with a monthly rebalancing cadence).
  • Portfolio construction: All selected stocks are given baseline weights (either equal-weighted or value-weighted). The LLM score then tilts the weights, controlled by a tilt parameter eta. After tilting, weights are re-normalized to sum to one.

Two practical data sources sit at the heart of the setup:
- A Finreon-like internal daily-return dataset (think a CRSP-like backbone for U.S. equities).
- High-frequency, firm-specific news from the Stock News API, covering outlets such as CNBC, Bloomberg, The Street, and more.
- A risk-free rate from FRED to calculate excess returns and risk metrics.

Crucially, the authors split the data into a Validation Set (October 2019–December 2023) and a Test Set (January 2024–March 2025). The validation period is used to pick the best hyperparameters, and the test period asks: would this have worked in a more forward-looking, out-of-sample world? The test period is particularly important because it begins after the model’s pre-training cut-off (ChatGPT 4.0 mini was trained up to October 2023). That means this is not a case of memorized knowledge leaking into the results—the model is truly applying its language understanding to new information.

What the results actually say

Here are the headline takeaways, framed for practitioners who want to understand what changed and why it matters:

  • Consistent outperformance on risk-adjusted returns

    • Full sample: LLM-enhanced momentum shows higher Sharpe and Sortino ratios than the baseline momentum. Sharpe moves from 0.57 to 0.69; Sortino from 0.54 to 0.69. Annualized returns rise from about 15% to 18%.
    • Out-of-sample (post-January 2024): Sharpe improves from 0.79 (baseline) to 1.06 (LLM-enhanced); Sortino from 0.93 to 1.28. Annualized return climbs from roughly 24% to 30%.
    • In both cases, volatility stays in the same ballpark or moves down slightly, and maximum drawdown gets a bit smaller (e.g., −31%/−31% vs −33%/−33% in the full sample).
  • Costs and turnover

    • The improvements come with higher turnover, reflecting more trading activity driven by the model’s signals. Everything is evaluated net of a conservative 2 basis points per trade, and the gains persist. So, the extra turnover isn’t eating away at the edge.
  • The landing zone for the best performance

    • Monthly rebalancing beats weekly rebalancing. Monthly cadence balances staying responsive to news with keeping turnover manageable.
    • The optimal look-back window for news is short: kk = 1 day. Expanding to 5 days offers little extra punch.
    • Simpler prompts tend to perform as well or a little better than more advanced prompts. The basic prompt edges out the advanced one in out-of-sample tests, though the difference isn’t statistically dramatic.
    • The tilt factor eta matters: giving more weight to the LLM signal improves Sharpe. However, practical limits come from typical portfolio constraints (e.g., maximum 15% per stock).
    • Concentration helps: the strategy shines when focused on a smaller set of stocks. The best observed Sharpe occurs with about 25 stocks, although there’s still a meaningful uplift when using 50 or more. The strongest gains appear when you don’t over-filter initially and let the LLM signal drive selection from a broader pool.
  • Baseline weight construction matters

    • Value-weighted baselines (i.e., heavier bets on larger firms) tend to yield bigger improvements when the LLM tilt is applied. Larger, more news-active firms help the model capture actionable signals more reliably.
  • The nature of the predictive edge

    • Importantly, the gains in the post-cutoff period reinforce the idea that the LLM’s value isn’t about memorized historical content. The model’s ability to interpret current news remains the engine of the improvement.
  • Robustness checks and limitations

    • The authors conduct a thorough parameter sweep (over 500 configurations on the validation set) and find the optimal settings cluster around a few practical choices: monthly rebalancing, kk around 1, m around 50, basic prompt, and a value-weighted initial allocation with eta about 5.
    • The out-of-sample period is relatively short, so statistical power is limited. They also note that the alpha from time-series regressions is positive but only weakly significant in some cases.
    • The work focuses on a single pre-trained model and a large-cap U.S. equity universe. Generalizability to other markets, asset classes, or illiquid securities remains an open question.

A practical read: what this means for real-world portfolios

  • A real-time reader of news can meaningfully augment factor signals
    Think of the LLM as a fast, in-house researcher who reads hundreds of small, firm-level news items each day and tells you which stories could plausibly push prices further in the near term. When you merge that with a momentum framework, the two sources of information reinforce each other rather than compete.

  • The value is in disciplined, not chaotic, implementation
    The strongest results come from a disciplined setup: a concentrated portfolio (around 25–50 stocks), monthly rebalancing, and a prudent tilt toward the model’s signals. This isn’t a free-for-all; it’s a targeted way to augment a benchmark with a credible, interpretable edge.

  • Simplicity can work
    The basic prompt—describing the momentum logic, listing the news items, and asking for a straightforward continuation signal—performed at least as well as the more intricate prompt variant. That’s encouraging for practitioners who want to keep things manageable and auditable.

  • The importance of costs and risk controls
    Even with higher turnover, net-of-costs performance improved, suggesting the LLM signals add value beyond friction. Diversification constraints (e.g., limiting single-stock weights) help with risk management and can still preserve much of the upside.

  • When to expect the most value
    The edge appears strongest in high-news environments and with larger, more liquid stocks where the model has richer textual data to parse. It also helps when you’re aiming for a relatively concentrated, conviction-driven portfolio rather than a broad market-covering approach.

  • Caution about generalizability
    Remember this is a specific setup: S&P 500 stocks, a particular news feed, a single pre-trained model, and a defined out-of-sample window. Real-world deployment would require careful customization, ongoing monitoring, and likely some fine-tuning or domain adaptation to other markets.

What’s new and why it matters

  • A real-time interpreter for financial news
    The study shows LLMs can translate textual information into quantitative signals that feed directly into asset pricing decisions. It’s not just about sentiment scores or keyword counts; it’s about interpreting a stream of firm-specific events and their likely price consequences.

  • Integration with traditional factor investing
    This isn’t replacing momentum or other factors; it’s augmenting them. The LLM adds incremental information about news flow that classic momentum signals alone might miss, especially in fast-moving, news-driven moments.

  • Practical prompts, practical outcomes
    The research emphasizes the value of concrete prompt design and sensible portfolio construction. Simpler prompts, a measured tilt, and a focused number of bets can deliver meaningful improvements with manageable risk.

  • A forward-looking test of model leakage
    By starting the out-of-sample period after the model’s pre-training cut-off, the authors address a common concern: is the model just parroting things it already “knows”? Their results suggest the edge lies in real-time interpretation rather than memorized content.

Limitations and avenues for future work

  • Short, but informative out-of-sample window
    The test period, while valuable for leakage concerns, is relatively brief. Longer out-of-sample testing would help confirm the durability of the signal.

  • Narrow scope
    The analysis centers on a single, pre-trained model and a U.S. large-cap universe. It would be interesting to see how different LLMs, fine-tuning strategies, or other markets perform under similar setups.

  • Potential for further enhancements
    Future work could explore domain adaptation (specialized financial training), combining LLM signals with other data sources (alternative data, sentiment indices), or testing in multi-factor portfolios to see how LLM-driven signals interact with value, quality, or low-volatility factors.

  • Operational considerations
    Real-world deployment would require considerations around latency, data costs, model governance, and risk oversight. The study’s approach uses prompt engineering and a fixed model; scaling would demand a robust, auditable workflow.

Real-world implications: turning theory into practice

If you’re an investment practitioner or an AI enthusiast with an interest in markets, this study offers a few actionable takeaways:

  • Don’t overlook the power of news in momentum
    Traditional momentum is rooted in price history, but news events are a plausible accelerant of those price moves. An LLM acts as a scalable, real-time reader of that news.

  • Start simple, but with discipline
    A lightweight prompt can deliver meaningful signals, especially when paired with a thoughtful portfolio construction framework (e.g., a value-weighted baseline with a measured tilt to LLM signals).

  • Focus on a concentrated, high-conviction set
    The results suggest larger benefits when you zoom in on a smaller number of stocks with stronger, more news-rich signals. This aligns with the intuition that big ideas often come from a few compelling stories, not a broad scattershot approach.

  • Balance innovation with risk controls
    The improvement in Sharpe and Sortino ratios comes with higher turnover, so you’ll want to weigh the costs and add risk controls (like position limits) to keep the strategy within acceptable risk boundaries.

  • Expect a practical path, not a sci-fi vision
    This study demonstrates a plausible, scalable way to integrate LLMs into a real-world investment process without requiring expensive model retraining. It’s an encouraging sign that AI can augment established investment disciplines rather than replacing them.

Key takeaways

  • LLMs can act as real-time interpreters of firm-specific news, augmenting traditional momentum signals to improve risk-adjusted returns.
  • The study’s best-performing setup uses monthly rebalancing, a one-day news look-back, a 21-day forecast horizon, about 25–50 stocks, a simple Basic prompt, and a value-weighted baseline with a notable tilt toward the LLM signal (eta around 5).
  • In both in-sample and out-of-sample tests, the LLM-enhanced momentum portfolio delivered higher Sharpe and Sortino ratios, lower or similar volatility, and smaller drawdowns, even after accounting for conservative transaction costs (2 bps per trade).
  • The gains are strongest when the portfolio is more concentrated and when the signal tilt is used judiciously; broadening the portfolio weakens the edge somewhat.
  • The improvements persist in a truly out-of-sample period that begins after the model’s pre-training cut-off, suggesting the edge comes from real-time interpretation rather than memorized knowledge.
  • There are caveats: the out-of-sample window is relatively short, the analysis uses a single model and a single market, and deploying such a system in practice requires careful attention to data costs, latency, and risk controls.
  • Overall, the work points to a practical pathway for incorporating LLMs into systematic investing, opening doors for further research into fine-tuning, multi-factor integration, and cross-market applications.

If you’re experimenting with prompting strategies in your own research or portfolio design, a few closing prompts to consider: start with a simple prompt and a modest tilt, test a monthly rebalancing cadence, and track whether the signals remain informative when you broaden or narrow the stock universe. As AI tools become more accessible and data streams more abundant, the line between “research edge” and “practical edge” may increasingly lie in how cleanly we can integrate language-model insights into disciplined investment processes.

Frequently Asked Questions