405 Billion parameters is not enough

24/09/2024
Joe Shockman

How can we restore trust in our AI systems?

Joe Shockman

Meta offers their Llama3 LLM under a very open license. I am reluctant to call it “open source” because the source of a model is hard to define. Is it the data? Is it the weights? Let it suffice to say that you can download Llama3 and run it yourself. Language models seem to get better as they get larger. Meta now offers Llama 3.1 in the very large 405B mode. That’s 405 Billion parameters. It takes a staggering amount of compute and time to generate a 405B parameter model. Think datacenters the size of train stations, thousands of GPUs working all day every day. And it’s still not enough to create a trustworthy AI. Why? Because larger is not, and can never be, the answer.

Why trustworthy AI?

I’m almost reluctant to ask this because the answer is so obvious. But the answer is so obvious we have forgotten to ask the question. AI will have to be trustworthy if we’re going to depend on it for any useful work. Before we address trustworthy AI let’s talk about trusting computers. Our computing systems are trustworthy, but only due to herclean efforts by generations of software engineers. These days you’d never think to double check a spreadsheet after it summed a column. (Unless you were a computer dork in the 90s when Intel had some emberrasing problems doing simple math.) But the truth of the matter is that computer errors are so common a whole fleet of error correcting measures have been invented. This is not my speciality so I don’t presume to offer an exhaustive list, suffice it to say error correction happens at all levels of computing.

ECC ram verifies data integrity
CRC watches for bit flips
TCP watches for data lost in transmission
Even our lowest level and most ephemeral storage has cache error correction

All of this hard work has brought us to the point where our computing infrastructure is so trustworthy it is almost beyond our notice. We are only starting to notice because a new generation of computing (and make no mistake AI is the latest generation of computing) is letting us down. Every generative AI company displays a warning that users should check the facts presented by generative AI. That means NONE of them can make trustworthy AI.

Putting the responsibility on the user for error identification and correction is bad business. And a violation of the supreme trust we’ve built in the reliability of our computing systems. Remember when Google was trustworthy? Remember when the quickest way to end an argument over facts was to “Just google it”? Now we’re presented with the warning:

“Gemini will not always get it right”

And Google results contain the suggestion to put glue on your pizza or to “eat a small rock every day”. Because Google failed to create modules to allow their AI to identify trolling, satire, and sarcasm. Guys, are you new to the internet? The Onion ONLY publishes satire. Failing to grant AI the situational awareness to recognize that is a pretty blatant falure. I won’t presume to tell a company with a $2 Trillion dollar market cap how to do it’s job, but aren’t you tagging your indexed content? Might I suggest that you bulk tag the entire Onion corpus as satire? If you do the LLM will actually natively understand present that as fact.

Instead of doing this and other quite reasonable steps, big tech companies are pushing the verification step onto users. This is irresponsible and simply bad business. To my mind making your customers’ problems your problems is the foundation of business. What if an AI company could verify facts before sending potentially innacurate or harmful data to their users?

What if an AI company built that and offered it as a service?

Our vision is to do just that. Our veracity engine can better set the LLM up for success and can intercept incorrect facts on it’s way back to the user. You might be wondering precisely how and why haven’t the big boys figured it out? They haven’t figured it out becuse they’re fixated on building larger and larger LLMs. If 70B parameters wasn’t enough what about 405B. Still not enough, will never be enough. Because LLMs lack the brain analogs that allow us to do critical thinking the way we do.

Let’s consider what LLMs are. LLMs can be thought of as the language center of AI. They speak well, but they are not great at critical thinking, recall, other critical functions of a complete mind. The language center is not enough, even a large and effective language center.

Imagine testifying before congress and just deciding to wing it. You might make a fool of yourself with catastrophic results. Failing to prepare will set you up for failure. Now imagine the reverse of that (preparation, thoughtulness, structured thinking, some prepatory work around what it might be proper to say and what you might want to avoid saying) Those additional preparatory steps are things you’d do to augment your language center. To prepare it with facts, critical thinking, wisdom, situational awareness, ethical preparedness. Those functions happen outside your language center (even if you plan them with words you execute them with other brain structuers.) Now map the metaphor to AI products. Failing to prepare an LLM with facts, situational awareness, critical thining, or any notion of social expectations is setting it up for failure.

For AI to be trustworthy we need to augment the Language center with other simulated brain structures:

Always check your facts before answering
Identify cases where structured thinking and process are necessary
Situate the AI with situational awareness
Check if answering is consistent with a positive social behavior
Check your work

The first 4 items can actually be attacked simultaneously by small, specialized models which have been fine-tuned for the task. Google, who have been fixated on rapid responses over accurate ones will be pleased to hear this. The last item can happen after the generative AI has generated a response (before or even after the result is streamed to the user)

The above architecture starts to look less like a naked LLM (which can be prone to error) and more like the society of mind proposed by Marvin Minsky or The Thousand brains proposed by Jeff Hawkins.

Perhaps we should pause our headlong dash towards larger and faster LLMs and listen quietly to the wisdom of our predecessors. It is entirely possible to create trustworthy AI. I’ve just outlined one possible approach. I hereby invite Google, OpenAI, Anthropic, Meta, and Amazon to copy our work.

And when you realize that it’s actually quite subtle and difficult, feel free to reach out and license our Veracity engine.

Let us not lose hope as we dash headlong into the AI future. It is entirely possible to regain the trust users have historically had in computing infrastructure. It is not only possible but it is of the utmost importance. It is the only way AI products and companies will survive. The moment the public wakes up to the fact that what they want is a tool as trustworthy as a spreadsheet they’ll start to demand that from every AI product and company. And anyone who thinks more parameters is the answer will have a rude awakening. Because more parameters will never be enough.

How can we restore trust in our AI systems?

Why trustworthy AI?

Sign up to get the latest news from Grounded AI