AI Hallucination

5/24/23

AI Hallucination

How to prevent catastrophic LLM hallucinations

Presented at GlueCon 2023
Joe Shockman & Allen Romano

As we (hopefully?) all know LLMs are simply a statistical model for generating reasonable sounding text. This, of course, presents some problems.

The chief among them is Hallucination.

Just like a person who knows nothing, an LLM with insufficient context will generate reasonable sounding bullshit … and for much the same reason (It wants to be helpful, but sometimes doesn’t have the context or facts it needs to give you the right answer). Also, just like a human making errors due to lack of knowledge it is often found to be “confidently wrong”

The good news is that unlike a person the LLM probably does contain the information you’re looking for. It just needs to be set up for success in order to get it for you.

Keep in mind that it’s not lying to you. It literally doesn’t know what any of this means.

From the point of view of the LLM, what we call “the failure mode” is identical to success.

It is a generative model. It generates. Therefore the success and failure check must happen outside the LLM.

Multiple Paths to hallucination

[And maybe it is a bad metaphor because it’s not exactly that…]

Not enough data
Made up words
Citations
Bios (try it on yourself, the LLM will happily grant you titles and accolades in excess of those you’ve earned.)

Is there nothing to be done?

Sundar Pichai: “No one in the field has yet solved the hallucination problems”

To the contrary, it is not a problem that will be soon “solved” because it is baked into the technology. It is a failure risk at the application level that must mitigated.

Ye, Ou, et al (May 18, 2023): minor changes in prompt can shift results up to 27% of the time; consistency can fluctuate, on average more than 3 times per hundred for identical prompts.

Manipulating the prompt is not enough. Simple hallucination vs. complex hallucination. We need to account for both. Prompt engineering best practices helps mitigate some simple hallucination and some complex hallucination. (best practices include clarity, CoT, context quotation, NA state specification.) But it is not dependable. Small changes to the prompt or components of the context can create unforeseen changes in behavior. There are also numerous gotchas. For example: token stuffing and neologisms.

Google chat bot demo example:

Factual error regarding James Webb telescope wiped $100B off their stock price.

Yes, yes stock goes up, stock goes down.

This particular error spooked wallstreet because it definitively answered the question on everyone’s minds. “Is Google ready to compete in the age of AI?” This error answered that question with a resounding “NO”

Business opportunity alert. If you use this talk to launch a hallucination prevention business you can build a business that would have saved Google $100B. If you do that I ask that you come find me and buy me a drink.

If you’re still not convinced

Ask ChatGPT about yourself. I’ll get some things right but it’ll probably make up some honors and awards. It really wants to be helpful and make you sound impressive and it has no concept that making things up isn’t allowed.

Some ways forward

Raising awareness

We need to shout from the rooftops about the possibility of hallucination. That’s what we’re doing here today. My goal from this talk is to be clear and compelling enough that each of you can take up the torch and continue talking about it.

The biggest danger is insufficient data, context and guidance.

Identify bootstrap modes where the LLM has insufficient knowledge or context (fail with an error before asking)

E.G. At Logoi and Grounded we ingest knowledge data. We know that the LLM will hallucinate if queried before sufficient information is present (or if you ask a question out of bounds of provided data). Firstly set the service into lame duck mode till we have sufficient context to generate useful answers.
Beware high-stakes uses. When the stakes are high you cannot rely solely on AI. Nor can a human rely on information generated solely by AI (GPT4 technical report Open AI 2023)
Split complex tasks into a cascade of steps.

E.G. Parse the prompt and attempt to identify: context, intent, setting and optimal prompt characteristics. If you fail to identify intent that’s a good opportunity to exit with failure!

This model also offers the added bonus of protecting against prompt injection attacks and other unexpected input.

This approach is also likely to be the structure for getting better performance on complex tasks (e.g. SmartGPT)
Use the parsed context and intent to context tune the prompt *This improves quality by allowing automatic context aware prompt tuning and also allows for validation at each step

LLMs are better than people in that they probably DO know everything so your task is to wring the correct data out of it. By splitting the task in step 2 we are able to benefit from richer context

Prompt engineering recipes: Ask model to “quote” from context for specific information; chain of thought (CoT) prompting;

Set the LLM temperature low so it is less creative “just the facts ma’am”
- When you interact with ChatGPT over the web you can’t set this but you can via the API. Again lower temperatures are less creative.
Expert mode. Once you identify the proper context, bake a request for an expert response into the prompt.
Add error handling on the response to catch hallucinations before they reach the end user.
Create and call a Fact Check Module™ (< not actually trademarked so feel free). Run responses through a more fixed verification system. In the case of our chat knowledge engine we found that the LLM could hallucinate username. That’s reasonably easy to guard against on the response level before the hallucinations can reach the user.
When running your own model you have an opportunity for model tuning (deliberately overfit the model to the current use case so it blows up noticeably if it goes out of bounds)

Again, pre-parsing prompts also prevents prompt injection attacks

Note for abuse departments out there. In cases where malicious behavior or attack is suspected or confirmed insert randomness into responses to obscure the nature of the detection state machine. A little randomness wreaks havoc on black box testing.

—
Mitigations we’ve implemented now:

Identify modes that are apt to generate hallucination
Go limp when operating in lame duck mode
Query parsing: intent detection which shapes targeted prompts and context building; includes denial of certain kinds of questions and simple injection attacks
Query modification: queries are wrapped in custom prompts based on question shape and intent
LLM output parsing: LLM output is parsed as JSON
Validation: output is checked for presence of required fields; output is checked for types of information
Response validation based on a formal check against source material.
Identification of out of bounds questions

Future Mitigations planned:

Query separation and granular checks of different parts of complex queries
LLM processing of parts
Cascade of parsers for more granular checks

How to prevent catastrophic LLM hallucinations

Multiple Paths to hallucination

Some ways forward

Sign up to get the latest news from Grounded AI