A citation is a promise: this source exists, it says what I claim it says, and you can go check. Language models break that promise fluently. They produce references that are perfectly formatted, plausibly titled, and entirely invented — and they do it with the same confidence they bring to everything else.
The result is now visible in the literature itself. Retraction notices increasingly cite “non-existent references” and “unverifiable sources.” These are not exotic failures. They are the predictable output of using a generative tool for a task it was never built to perform: telling the truth about the external world.
Why it happens
A model doesn't retrieve a citation; it predicts what one should look like. Absent a real source, it generates the most statistically likely string — a citation-shaped object with no referent behind it. To the eye, it is indistinguishable from a real one. That is exactly the problem.
The fix is not cleverer prompting. It is human verification, applied without exception. Every reference in every manuscript we touch is checked against the actual source before it reaches a publisher. It is slow, unglamorous, and the entire point.
