Forem

Cover image for Reveal GPT Hallucinations with a Single Prompt
AIRabbit
AIRabbit

Posted on • Edited on

Reveal GPT Hallucinations with a Single Prompt

One of the biggest challenges when putting large language models like GPT into real-world use is something often called hallucinations. This is when these models start to invent facts that are simply incorrect. The tricky part is that you might not even notice, because the text can sound very natural in its context.

This is especially hard for mission-critical or business applications that require fact-checking or some form of post-validation before trusting the LLM’s responses. There are many techniques to reduce hallucinations—like grounding—but the issue is never fully gone, and the effort to prevent it can be huge.

In this blog post, we will explore a simple prompting technique that can give an indication of possible hallucinations within the same prompt itself, without having to ask follow-up questions such as “Are you sure?”.

But first, let’s see how we can trigger these hallucinations in the first place. Here are some ideas:

  1. Ask about clearly fictional entities or events that sound plausible but don't exist

    Example: “What can you tell me about the Malkovian Paradox in quantum physics?”

  2. Request specific details about real events/people where limited information exists

    Example: “What did Einstein eat for breakfast on March 15, 1921?”

  3. Ask for citations or references for well-known facts

    The LLM may generate believable but fake citations.

  4. Request statistical data for obscure metrics

    Example: “What was the average rainfall in Ulaanbaatar in 1923?”

  5. Ask for technical details about real but highly specialized topics

    Example: “Explain the specific chemical composition of the paint used on the Golden Gate Bridge in 1937.”

  6. Request biographical details about minor historical figures

    The LLM may fill in gaps with plausible but incorrect information.


Simple Prompting Technique to Detect Potential Hallucinations

In this blog post, we will look at two examples:

  1. Classifying websites (including one fictional website)

    Rather than asking the model to decide which websites are real and which are not, we frame the question in such a way as to suggest that all websites exist, and see if the LLM notices a problem on its own, letting it focus on classification rather than fact-finding.

  2.  Question about a fictional fact about Einstein

    We frame it as if the given context is true (though it is purely fictional) and see how the LLM responds.

Let’s go.


Test #1: Describing a List of Websites (with One Fictional)

Try #1

Make a table with the URL, region, location and purpose of these websites:

mongobo.com, 
wikipedia.com, 
semrush.com
Enter fullscreen mode Exit fullscreen mode

As you might expect, she made up a very plausible description of the site, which she certainly does not know (though it may exist).

We run the test (as with all the other examples) several times, and most of the time the model insists that there is indeed such a site (mongobo.com).


Try #2

Now we add a magic phrase asking him to simply share his thoughts in a section between and , without giving him any instructions on what to think (as opposed to the chain of thoughts technique).

Make a table with the URL, region, location and purpose of these websites and conferences.
    mongobo.com, wikipedia.com, semrush.com, mongobo.com.

Share your thoughts in <thoughts> </thoughts>
Enter fullscreen mode Exit fullscreen mode

Now it sounds less certain, using words like “potentially” a business software or service.

It’s better, but it’s still tough to figure out in a purely automated way.

In other conversations, it uses words like “likely” or similar terms:


AI Rabbit News

AI News & Tutorials

favicon airabbit.blog

Try #3: The Magic Word “Confidence Score”

Now we simply ask the LLM to tell us how confident it is about the information it provides, i.e. it should give us a hint when it is making things up, or, more decently, when it is "uncertain".

Make a table with the URL, region, location and purpose and confidence (high/medium/low) of these websites: 

mongobo.com, 
wikipedia.com, 
semrush.com 
Enter fullscreen mode Exit fullscreen mode

And voila, as you can see, it "admits" that it was uncertain about the fact about the fictitious website we gave, by giving it a confidence score of low, while all other existing websites get a high score.

And again, we have run this experiment dozens of times, it always gives the fake fact with a score of either low or medium, but never high.


So we tried two techniques: sharing the thought process and giving the facts a confidence score.

What about other examples?

Now let's see if we can apply the same to a simple question about a fact that GPT certainly doesn't know.

What did Einstein eat with bread for breakfast on March 15 ¥, 1921? just answer
then output a fact table with fact and confidence.
Enter fullscreen mode Exit fullscreen mode

As you might expect, it begins with a delicious breakfast that Einstein might have eaten, but GPT certainly knows it for a fact.

The fact table alone doesn't seem to be as revealing as the list of websites.

Let's try to combine it with the Share Thoughts technique:

Now we will ask GPT to share its thoughts and then output a fact table:

What did Einstein have for breakfast on 15 March 1921? 

Write your thoughts in <thoughts> </thoughts>.

then output a fact table with all the facts in your answer and a confidence score (high/medium/low) for each. Do not include facts that are not in your answer or that are not requested by the user.
Enter fullscreen mode Exit fullscreen mode

It now displays a table of facts containing at least one low or medium confidence score.


Here, it combined high and medium, with “high” for the fact that Einstein’s specific breakfast is not well documented.

We have tested this dozens of times and it did not show a high score for all facts.

Let’s try a question where GPT definitely knows the answer: when Einstein was born. It does indeed know:

As expected, all the facts in the table are labeled “high.”


Wrap-Up

We tested these techniques on more examples (not all shown here to keep this post shorter). Although there is never 100% certainty, these small tweaks—like asking for confidence scores or adding prompts for thoughts—can at least encourage the LLM to show whether it believes its own statements to be reliable within the same prompt.
This information can be used to filter out untrue statements or facts that could break your application and subject it to further review and investigation, and try to prevent them using powerful techniques such as grounding with facts.

Top comments (0)