Google Research: how thinking unlocks parametric knowledge in LLMs
Google Research reveals two mechanisms by which a reasoning trace improves retrieval of facts stored in model weights — computational buffer and factual priming — tested on Gemini 2.5 and Qwen3-32B.
This article was generated using artificial intelligence from primary sources.
Why do models forget what they know?
Large language models store enormous amounts of knowledge in their weights — so-called parametric knowledge (facts encoded directly into model parameters, without access to external databases). Yet users regularly observe that models hallucinate even about data they were trained on. Google Research now explains why — and how a reasoning trace changes the equation.
Two mechanisms that change knowledge retrieval
Google Research identified two separate mechanisms by which reasoning steps (a reasoning trace — the sequence of intermediate steps the model writes before the final answer) improve retrieval of parametric knowledge.
Computational buffer operates at the level of computational capacity: each additional forward pass through the network gives the model more room to search its knowledge. The key demonstration — even meaningless “filler” text such as “Let me think…” improves accuracy, because it extends processing without any semantic content.
Factual priming operates through content: partway through its reasoning, the model surfaces related intermediate facts that — through spreading activation — trigger the correct final answer. The mechanism is analogous to how a person remembers a name through an associative chain.
Results on Gemini 2.5 and Qwen3-32B
The study was conducted on Gemini 2.5 Flash, Gemini 2.5 Pro, and Qwen3-32B using the SimpleQA Verified and EntityQuestions benchmarks — datasets designed to measure the accuracy of factual answers from parametric knowledge.
Key finding: a single hallucinated intermediate fact in the reasoning trace significantly degrades the accuracy of the final answer, even when the rest of the reasoning is correct. This explains why models that think out loud sometimes make more errors than shorter models — a bad intermediate step can steer priming in the wrong direction.
What this means in practice
The finding has a practical implication: for applications that depend on factual accuracy, the length and quality of reasoning traces is not ornamental but a critical factor. Prompt and system designers need to pay attention to which intermediate facts the model surfaces — not just the final answer.
Frequently Asked Questions
- What is parametric knowledge and why is it hard to retrieve?
- Parametric knowledge consists of facts encoded directly into model weights during training, without access to an external database. Retrieval is unreliable because the model must activate the right neural pathways based solely on the query.
- How does the computational buffer help a model remember accurate facts?
- Each additional forward pass through the network — even with meaningless text such as 'Let me think' — gives the model more computational capacity to search its knowledge, similar to a person taking a moment to think.
Related news
arXiv:2606.25325: OPPO — an RL framework that teaches AI to read emotions from voice, face, and text simultaneously
arXiv:2606.24510: RaDaR — specialized 32B reasoning LLM accelerates rare disease diagnosis in RCT
arXiv:2606.24014: RL training on health domain transfers alignment to 80%+ OOD benchmarks