One of the problems with LLMs is that they almost never answer I don’t know. When they don’t know something, they still generate an answer, that usually sounds confident and consistent but… it might be incorrect. And if you ask the same question in several conversations, they’ll tend to give you different answers every time.
TRY THIS YOURSELF
Tell an LLM that you ate a particularly good variety of cheese, or drunk a great wine. Say you don’t know its name. Describe it, but in vague terms. The LLM will probably suggest the one that vaguely matches your description and appeared most often in their training material.
Say that it’s not the one they suggested, they’ll come up with the next most frequently found varieties, but without a logical reason.
Start a new conversation and ask again. If there isn’t a variety that matches your description and is more common than others, they’ll probably come up with different answers every time.
How overconfidence works
This is a problem for several use cases. Imagine using an LLM as a replacement for a doctor – because the patient is not wise, or because no doctor is available and the problem is urgent. You might ask something like: I have this and that symptoms, what might it be? Suppose it’s a rare disease, or for some reason you have rare symptoms. There is a high risk that the LLM answers with the AI equivalent of an educated guess, or worse, a wild guess.
But why don’t they just admit their ignorance on that particular topic? I didn’t know that, until I discussed this subject in depth with some LLMs and made some experiments.
LLMs don’t know what they don’t know
Socrates said that the only true wisdom is in knowing you know nothing. But then, I must conclude that LLMs aren’t wise at all. Because they just don’t know what they don’t know. Let me explain how it works.
You ask a question. The LLM reasons to find a good answer. In simplistic terms, some people would say that they predict the answer with probabilistic functions, token after token.
Let’s accept this simplification to keep our explanation simple. When an LLM knows something, it’s because reinforced learning made an answer more likely than others. During their training they were asked to generate many answers. The good one were rewarded. Reasonings that were rewarded have more probabilities to be followed again.
When an LLM doesn’t have an answer… well, this rarely or never happens. The LLM knows many possible answers, because its language and reasoning abilities allow the LLM to compose many answers. But if the correct is unknown, several answers have more or less the same probability of being generated. The LLM never learnt that some are acceptable and others aren’t.
The reasoning process generates answers from the LLM tokens that constitute the reasoning. A token is a word, or part of a word. But they know the token, not the way it was selected. How many probabilities did they have to select that particular token? They don’t know.
Claude Sonnet 4 told me:
This is a fundamental limitation - I experience generating all responses similarly, regardless of how much training data supported that information or how certain the underlying model might be. (...) I can't distinguish between confident knowledge and confident-seeming guesses.
Meta-introspection
LLMs know how LLMs theoretically work, to some extent. Just like a neurosurgeon knows how human brain theoretically works. But, just like us, LLMs are can’t follow the flows of their reasoning and examine the information present in their neurons. They’re not capable of introspection. But they can observe the output they generated in the current conversation, and do a sort of self-analysis. I consistently use this capability to explain some of their behaviours, or to validate my learnings about LLMs.
Ask an LLM a question it can’t answer. After a few wrong guesses, and after you state they’re wrong, you can question the LLM’s ability to answer correctly. It will examine the conversation until that moment, it will see the pattern of its reasonings, and will admit that it doesn’t know the answer. It didn’t know that before. It realised its ignorance by examining the answers it emitted earlier.
Exceptions: the known unknown
There are notable exceptions. If you ask whether a god exists, or what will happen tomorrow, you can’t have a response. The LLM will probably answer with a very verbose, well-motivated “I don’t know”. Because their training explicitly taught that there is no known answer to this and many other questions. In this case, they’re not eadmitting a lack of knowledge: “no one knows” or “I don’t know” are known facts, in some contexts.
Another exception is when they look for an answer, but have troubles building one with consistent reasoning. This happens, as far as I know, when the information they have on this subject is insufficient or contradictory. In this case they can deduce that they don’t know the answer for sure, though they generally try to answer on a best-effort basis. In other words, LLMs try hard to answer, even when they know that doing so is not reasonable.
Teaching the unreasonable
Let’s focus on this try hard attitude or, if you like, this know-it-all attitude. Let’s see how they learn it.
The carrot and the stick
LLMs training initially teaches LLMs many words and many ways to use them to compose meaningful sentences. At some point, they also start to learn the meanings of these words. Sure, I’m simplifying a lot, and but this doesn’t matter now. The point is that, after they learnt languages, they need to learn how to answer questions in a helpful way.
This is done with reinforcement learning. This is the AI equivalent of a method used with human children. Essentially, the LLM is asked questions, its answers are evaluated, and it received positive rewards for good answers, and negative rewards for poor answers.
For some questions, only one correct answer exists. This is the case for mathematical questions, for example. Such questions can be evaluated automatically. For other questions, the answer is evaluated by specialised LLMs or humans. Both play an important role.
Rewarding the know-it-all attitude
The problem here is that humans and LLMs tend to reward answers that transfer knowledge to the user. When asked a question, the LLM should provide a correct answer to get a positive feedback. An incorrect answer will lead to a negative feedback. In this way, if a similar question is asked in the future, the correct answer will become more likely than a wrong answer.
It’s reasonable suppose that, occasionally, the LLM will answer “I don’t know” or something similar, during reinforcement learning. But this answer is not useful for the LLM user, so it can’t be encouraged by the trainers.
If you think about it, this makes sense. I don’t think there is a perfect solution for this. While LLM vendors try to mitigate the problem, reinforcement learning must essentially encourage knowledgeable answers.
Confidence in confidence
You can, of course, ask an LLM to express probabilities. Something like:
How many chances are there that the abstract and the introduction of Apple's paper "The illusion of thinking" were written by an LLM?
Like it or not, some LLMs will say 90% or 85%. But they don’t calculate probabilities in any way. For them, it’s just a colloquial expression. It means that there are many indications that the fact is true, while they can’t be absolutely certain. Maybe some LLMs initially think 90% and then they reduce the probability by a 5% to highlight the uncertainty. Maybe 85% it a token that was found in their training materials in similar situations. But I interpret it as somewhere between 51% and 99%.
Can LLMs learn from Socrates?
Maybe one day LLMs will be able to reliably recognise and admit uncertainty. In the field of AI, one day might mean tomorrow, in two years, or never. But efforts made by LLM vendors to solve or mitigate the problem are indeed interesting.
I might write about these methods when I learn more about them. For now I can only tell you that they follow very different directions: training LLMs to express accurate probabilities, using multiple LLMs to identify disagreement areas, Anthropic’s Constitutional AI, etc.
IMAGE CREDIT: OpenAI DALL·E 3
I found the image idea a bit stupid, but funny. And it’s related to the topic: DALL·E 3 clearly didn’t know what type of image was needed here, but instead of admitting this, it guessed.