Generative AI makes us more productive by creating a lot of output quickly. The problem is that this output is not grounded in fact.
This has led to a flood of content that lacks factual accuracy. This is especially problematic in fields like science and law, where accuracy is crucial.
As a result, these fields are under threat. Scientific publishers are grappling with the proliferation of inaccurate research, while lawyers have been caught out bringing fake case law to court.
To ensure factual and reliable content, we need better tools to verify information. These tools can assist reviewers by automating initial checks and bringing relevant information to the fore, making it faster for them to decide what work is of high quality.
As the rate of information production increases, so does the need to verify it quickly.
According to their annual report, Springer Nature received 1.5+ million publication submissions in 2022, with 750k+ peer reviewers involved. Compare this to Elsevier, who report twice that: 3 million article submissions with 1.5 million reviewers. Submission rate is increasing year-on-year with no signs of slowing down.
With so much information, bad stuff gets through. A notable example is the recent retraction of a landmark study in Alzheimer’s research due to doctored images. This study has been cited nearly 2,500 times, putting it in the top echelon of cited papers.
This indicates the extent to which potentially unreliable claims can propagate. Published by Nature, this shows that even the best journals can be fooled, highlighting the need for better fact-checking tools and integrity guardrails.
What’s the scale of this issue? One report estimates that 1.5–2% of all 2022 scientific papers could be from paper mills - organisations that produce fake research for profit.
These statistics emphasise the sheer volume of information that needs rigorous fact-checking and quality control, reinforcing the need for effective verification tools.
Not all inaccurate information is fraudulent, however. We all make mistakes, whether providing a false reference or missing a key paper in our analysis. Better tools can help us all produce better work faster. Our machines can help reduce the burden of fact-checking and quality control, allowing these processes to be done at speed and scale.
There are many ways content can be erroneous, misleading or fraudulent. RetractionWatch publishses a list of reasons for paper retraction, which provides a good resource for understanding the types of errors that occur.
Here’s a list of some common types that are particularly interesting to me:
The challenge is to develop tools that can identify signals related to these errors and help reviewers make informed decisions about the quality and veracity of the information.
The consequences of misinformation are significant.
In science, huge investments may be made in research based on false premises. The pharmaceutical industry, in particular, is constantly battling against the high attrition rate in drug discovery (see StudyRecon - drug development attrition rates). High quality information in the scientific record is a key part of optimising decision-making early on in development to reduce the risk of costly failures later.
Publishers themselves are also at risk. Wiley recently disclosed expected revenue losses up to $35-40 million due to issues with Hindawi, who they acquired in 2021. Hindawi’s journals have been overrun by paper mills, leading to thousands of retractions, journal closures, and delisting from major indexes.
In the legal field, the consequences of misinformation go beyond financial loss, to disbarment and even criminal charges. The integrity of the legal system and society itself is at stake when false information is presented as fact.
A risk that effects all entities, from individuals to corporations, is the damage to reputation. Misinformation can lead to a loss of trust and credibility, which can be difficult to recover from. This is particularly true in fields like science and law, where accuracy and integrity are paramount.
Both Springer and Elsevier have stated their coninuted ambition to support the peer review process by leveraging tools to uphold information quality and integrity.
From Springer Nature’s 2022 Annual Progress Report:
From Elsevier’s Publishing Ethics page:
We have an opportunity to support this commitment by developing tools that can assist in the verification of information and plug into the existing workflows of these publishers.
Here some examples of key players already demonstrating leadership in this space:
As we adopt increasingly sophisticated tools to assist in fact-checking and quality control, we must be mindful of the limitations and potential pitfalls of these systems.
Human reviewers must remain in control, using AI tools to supplement their research and decision-making. We shouldn’t rely solely on machines to determine truth, though they can address much of the legwork under our instruction, helping us bring new evidence to light, to bolster our arguments and challenge our beliefs.
It’s crucial for these tools to assume an objective, transparent, and explainable role, showing how they arrived at their conclusions and allowing human reviewers to verify and challenge their findings.
As verification systems become increasingly advanced, they will become integrated with generative AI systems themselves, creating a feedback loop that is currently missing. This is a route to improving the accuracy of theses systems over time.
In the age of generative AI, ensuring the veracity of information is more critical than ever. By leveraging advanced AI tools for fact-checking and quality control, we can safeguard the integrity of scientific research and legal proceedings, supporting a reliable and trustworthy information landscape.
Thanks to Matthew Salter for insightful discussions on this topic.