Artificial intelligence is rapidly transforming the investment landscape in ways that extend far beyond algorithmic trading and robo-advisors. One of AI’s most promising applications lies in its ability to process and extract meaning from vast amounts of unstructured text—something that even the most diligent human investors struggle to do at scale. While a skilled analyst might carefully read through a handful of company filings in a day, AI can analyze thousands of documents simultaneously, identifying patterns and connections that would be virtually impossible for humans to spot. This capability is particularly valuable because much of the information that moves stock prices is buried in narrative disclosures—the sea of text that companies release through regulatory filings.

With the average 10-K report containing over 60,000 words, the challenge is identifying which sentences actually matter—what’s actually new and important enough to move stock prices? Finding this relevant information can be like trying to find a “needle in a haystack.” Anna Costello, Bradford Levy, and Valeri Nikolaev, authors of the November 2025 study “Representations of Investor Beliefs” tackled this question using artificial intelligence.

What the Researchers Examined

Costello, Levy, and Nikolaev developed a novel approach to identify “surprise” information in corporate filings. Their solution combined information theory with large language models (LLMs)—the same technology behind ChatGPT. They trained AI models specifically on financial disclosures to understand what information investors already know about a company, then used these models to identify truly new information in subsequent filings. Their study required:

Pretraining an LLM from scratch on a cross-section of firms’ narrative disclosures.

Further pretraining the LLM from each individual firm’s time-series of disclosures to yield a firm-specific model for each firm in the sample.

Iteratively applying and further pretraining the firm-specific model.

Out-of-sample test to measure the information in new narrative disclosures.

Their study analyzed all disclosures filed on SEC EDGAR by 500 companies from 1996 through 2023, covering nearly 278,000 filings with approximately 1.7 billion words. By pretraining from scratch with a fixed knowledge cutoff of 2007 and iteratively updating each firm-specific LLM, they addressed concerns regarding look-ahead bias.

Key Findings

