arXiv:2605.18732: Scaling Law for LLM Hallucinations

Researchers tested 38 models on 8,900+ references and showed that LLM factual recall follows a sigmoid curve: the combination of parameter count and topic prevalence in training data explains 60–94% of variance. Hallucinations are not random — they are predictable and measurable.

Hallucinations Are Predictable — Mathematically

A new paper on arXiv (2605.18732) delivers an uncomfortable but useful conclusion: confabulations (the term the authors prefer over “hallucinations”) are not random errors. They are predictable phenomena that follow a scaling law — just as linguistic fluency or context understanding do.

A team of researchers from the Université du Luxembourg tested 38 models on more than 8,900 scientific references and found that the quality of factual recall follows a sigmoid curve in a log-linear combination of two factors: the model’s parameter count and the topic’s prevalence in training data.

Why a Sigmoid — and What Does That Mean in Practice?

The sigmoid function describes a transition from “almost never correct” to “almost always correct” across a relatively narrow range of input values. The analogy: a person does not remember a sentence they just read linearly better as they become smarter — there is a threshold below which nothing sticks, and a threshold above which everything does.

For LLMs this means: if a topic is rarely represented in training data (e.g. an obscure scientific paper), even a large model will confabulate — inventing authors, the year, conclusions. On the other hand, a well-represented topic combined with sufficient parameters enters the “safe zone” of the sigmoid curve. The authors model this as a signal-to-noise ratio: the signal is the frequency of the concept in the data, the noise is the model’s capacity “floor” below which recall does not function.

Is Confabulation the Same Thing as Hallucination?

Not quite. Hallucination is a broader, semi-formal term — covering all situations where a model generates content without grounding in its input or reality. Confabulation (borrowed from neuropsychology) more precisely describes confidently filling in the gaps — the model does not know that it does not know, so it synthesizes a convincing but incorrect answer. The paper uses this term precisely because it emphasizes the predictability and structure of the error, as opposed to randomness.

Practical consequence: 60–94% of variance in factual accuracy is explained by two measurable factors. This means it is possible to estimate hallucination risk for a given topic in advance, without testing the model on every query individually.

Frequently Asked Questions

What are confabulations in the LLM context?

Confabulations are fabricated or unreliably recalled facts (authors, years, conclusions) that LLMs produce when a topic is underrepresented in training data. The paper authors prefer this term over 'hallucinations' because the failure mode is more akin to confabulation than visual perception error.

Why a sigmoid curve, not linear error decline?

The sigmoid describes threshold transitions: below a certain topic prevalence in training data, even large models retain almost nothing reliably. Above the threshold, recall rapidly reaches 'almost always correct'. Model size alone is not enough — the signal/noise combination is decisive.

What is the practical implication?

Hallucinations can be predicted before inference if we know model size and an estimate of topic prevalence in training data. This opens the door to 'confidence routing' — systems delegating low-resource queries to tools using external sources (RAG, search) instead of relying solely on LLM recall.

arXiv:2605.18732: Scaling Law for Hallucinations — Larger Model Does Not Always Mean Fewer Errors

Hallucinations Are Predictable — Mathematically

Why a Sigmoid — and What Does That Mean in Practice?

Is Confabulation the Same Thing as Hallucination?

Frequently Asked Questions

Sources

Related news