Hallucinated Citations in the Published Scientific Literature: A Large-Scale Analysis in Collaboration with Nature

See the article in Nature on 1st April 2026: link.

The collaboration began when Nature journalist Elizabeth Quill reached out following an introduction from Josh Nicholson at Scite. We designed the analysis jointly with the Nature team, drawing on our experience screening manuscript submissions for scholarly publishers.
Sampling methodology: We sampled from PMC and Crossref - capturing both open-access full text and closed-access reference metadata.
Scope: Our analysis covered 4,000+ publications from 2025 across five major publishers: Elsevier, Sage, Springer Nature, Taylor & Francis, and Wiley, with reasonbly equal representation across publishers and time.
Detection logic: We used the Veracity API to parse unstructured references, resolve them via web search, compare metadata and flag citation concerns.
To distinguish AI-generated errors from ordinary human mistakes, we generated 20,000 synthetic papers using two frontier LLMs to build a corpus of AI citation errors. This let us identify characteristic hallucinated error fingerprints and use that to weight risk scores for each publication in our sample.
Validation: Manual review by the Nature team confirmed that 65 of the 100 highest-risk publications contained at least one invalid reference. I.e. a citation pointing to a source that does not appear to exist.
Estimated prevalence: Extrapolating, this implies that 1.625% (65/4,000) - more than 110,000 of the ~7 million scholarly publications from 2025 - contain at least one AI-hallucinated citation. The true figure is likely higher as our risk score was somewhat conservative and manual validation only focused on the highest-risk publications.
Publisher performance: We help publishers catch a lot of these issues when screening submissions early in the publication lifecycle. What we saw here was roughly 5-10% of the issue frequency we when screening submissions. Publishers are therefore catching the majority of these issues before print, but lots are still slipping through.
Types of errors: The types of citation problems flagged in 2025 publications appear qualitatively different from what we observe before the LLM era. We see a much higher proportion of completely fabricated citations (e.g. to non-existent papers or journals) and random mismatches compared to more traditional errors like typos in author names or titles. This indicates that careless or egregious AI use is the primary driver.

Specific findings are available upon request. Get in touch.

Abstract