Why do AI chatbots hallucinate?

AI chatbots hallucinate when they are asked to retrieve facts and generate conversational responses in a single step. Without a verified knowledge base to draw from, the model generates the most plausible-sounding response from its training data — which may include completely fictional information delivered with high confidence.

How do you stop an AI chatbot from making things up?

Separate fact retrieval from conversational delivery using two passes with different temperature settings. Train the chatbot on a specific knowledge base of verified information. Use positive framing in instructions — tell the AI what it is and what to draw from, rather than listing prohibitions. Design explicit fallback behaviour for questions outside its knowledge.

Does upgrading to a newer AI model improve chatbot accuracy?

Not necessarily. Every AI model has its own fingerprint — unique tendencies, strengths, and failure modes. A system tuned for one model can produce worse output on a newer model that benchmarks higher. AI is non-deterministic, so model upgrades should be tested against your specific use cases, not generic benchmarks.

How to Build an AI Chatbot That Doesn't Lie to Your Customers

We recently wrote about what went wrong when Woolworths’ AI assistant Olive started telling customers about its mother and uncle: behaviour that Woolworths later confirmed was deliberately scripted. The business fix is straightforward: tell customers it is AI, and give them an easy path to a human.

But there is a technical story too. Whether the fiction comes from scripting or from the model itself, the underlying problem is the same: a system with no architectural guardrails against fabrication. Scripted or generated, the confident delivery of fiction as fact is a predictable outcome of how the system was built.

If you are building or buying a customer-facing AI tool, the architecture determines whether it helps your customers or embarrasses your business. Here is what building production AI systems has taught us, and what we would do differently if we were building Olive.

Every Model Has a Fingerprint

The first thing production AI teaches you is that models are not interchangeable. A new model does not mean a better model; it means a different model.

Every model has its own fingerprint. Its own tendencies, strengths, blind spots, and failure modes. A system tuned for one model: the prompts, the temperature settings, the guardrails; can produce dramatically worse output on a newer model that benchmarks higher on every public test.

AI is non-deterministic. The same prompt, the same model, the same settings can produce different output every time. This is a feature when you want natural-sounding conversation. It is a serious problem when you need factual accuracy in front of customers.

When Woolworths upgraded Olive to a newer model for “more natural voice conversations,” the naturalness came with a cost. A model optimised for fluid conversation is also optimised for continuing the conversational pattern, and when a customer asks “are you a real person?”, the conversational pattern is to say yes.

The lesson is not to avoid new models. It is to test them against the specific tasks your system performs, not against generic benchmarks. A model that scores higher on reasoning tests can still hallucinate more confidently in your particular use case. The gap between a demo and production is where these differences surface.

Separate the Facts From the Conversation

The most common architectural mistake in customer-facing AI is asking the model to do everything in one step: understand the question, find the answer, and deliver it conversationally, all in a single pass.

This is how you get an AI that invents a mother.

The fix is to separate retrieval from generation. Two passes, not one.

Pass one: find the facts. Low temperature. Structured output. The AI searches your knowledge base: your FAQs, product information, policies, service descriptions; and retrieves only what is relevant to the question. No creativity. No conversation. Just information retrieval with high accuracy.

Pass two: deliver the response. Moderate temperature. Conversational tone. The AI takes the retrieved facts and presents them naturally. It can only work with what pass one found. If pass one found nothing relevant, pass two says “I don’t have that information” instead of inventing an answer.

This separation is why temperature matters. Temperature controls how much creative latitude the model takes. High temperature produces more natural, varied conversation, and more hallucination. Low temperature produces more accurate, predictable responses, and more robotic delivery.

You do not want the same temperature for “find the relevant policy” and “explain it to a customer in plain English.” The principle applies to any AI system: analyse first, generate second. Separate the thinking from the talking.

Train It on What You Actually Know

Olive’s problem was not just that it fabricated a personal history. It was that it had nothing to fall back on when the conversation went sideways.

A well-architected customer chatbot is trained on a specific knowledge base: your website content, your FAQs, your product documentation, your service descriptions, your blog posts. When a customer asks a question, the AI draws from that knowledge base. When the question falls outside it, the AI says so.

This is not a limitation. It is a feature. A chatbot that only answers from verified information is dramatically more trustworthy than one that tries to answer everything.

The knowledge base also solves the consistency problem. Every customer gets the same accurate information because the AI is drawing from a single, maintained source of truth, not generating answers from its training data, which may be outdated, incorrect, or entirely irrelevant to your business.

Building the knowledge base is not a massive undertaking. If you have a website with service descriptions, a FAQ page, and a few blog posts, you already have the foundation. The AI does not need to know everything; it needs to know your business, accurately, and know when to stop.

Tell the AI What to Do, Not What to Avoid

When businesses brief their AI systems, the instinct is to write a list of prohibitions. Do not make things up. Do not pretend to be human. Do not discuss competitors. Do not give medical advice.

This approach feels thorough. It is also the least effective way to instruct AI.

Negative constraints trigger cautious, hedging behaviour, or they get ignored entirely when the conversational context is strong enough. “Do not pretend to be human” is a weak instruction when a customer is directly asking “are you a real person?” and the conversational pattern favours saying yes.

Positive framing works better:

“You are an AI assistant for [business name],” not “do not pretend to be human”
“Answer only using information from the provided knowledge base”; not “do not make things up”
“When you cannot find a relevant answer, say: I don’t have that information, but I can connect you to our team”; not “do not guess”

The AI has a clear identity, a clear scope, and a clear fallback. There is no ambiguity to exploit and no gap where hallucination can creep in.

When It Does Not Know, It Should Say So

The final architectural requirement, and the one most businesses skip, is designing for graceful failure.

Every customer-facing AI will encounter questions it cannot answer. The question is whether it admits that or fills the gap with fiction.

Olive filled the gap. When confronted with a question about its own nature: something not in its training data for customer service; it generated the most plausible-sounding response it could. That response happened to include a fictional mother with an angry voice.

A well-designed system has explicit fallback behaviour:

Questions outside the knowledge base get a clear “I don’t have that information” response
Ambiguous questions get a clarifying question back, not a guess
Sensitive topics route immediately to a human
The AI never generates claims about itself, its feelings, or its experiences

These are not safety features bolted on after launch. They are architectural decisions made before the first line of code. The difference between an AI that embarrasses your business and one that earns trust is almost entirely in how it handles the moments when it does not know the answer.

The Bottom Line

Olive’s failure was not a mystery. It was a predictable outcome of architecture that prioritised conversational fluency over factual accuracy, without the guardrails to manage the tradeoff.

Building a customer-facing AI that does not lie is not about finding a better model. It is about building a better system:

Test models against your specific tasks, not benchmarks. Every model has a fingerprint.
Separate facts from conversation. Two passes: retrieve accurately, then deliver naturally.
Train it on your knowledge base. The AI answers from what you know, not what it imagines.
Frame instructions positively. Tell it what it is, not what it should avoid.
Design for failure. When it does not know, it says so; clearly, immediately, every time.

The business decisions matter too: honesty and human fallback are non-negotiable. But the architecture is what makes honesty possible at scale. A well-built system does not need to be told not to lie. It simply has no mechanism to do so.

Perth AI Consulting builds AI systems that your customers can trust: chatbots, automation, and tools architected for accuracy, not just fluency. Start with a conversation.

How to Build an AI Chatbot That Doesn't Lie to Your Customers

Every Model Has a Fingerprint

Separate the Facts From the Conversation

Train It on What You Actually Know

Tell the AI What to Do, Not What to Avoid

When It Does Not Know, It Should Say So

The Bottom Line

Supervised Autonomy: The Middle Path for AI Architecture

The State of Applied AI in Mid-2026

How to Design a PHI Redaction System for Clinical AI

How We Built On-Device De-Identification So AI Never Sees Real Names

Your Agency's Clients Are About to Ask Why This Costs So Much

What Do You Love Doing? What Do You Hate Doing?

Why I Don't Use n8n (And What I Do Instead)

Your Codebase Was Not Built for AI. That's the Actual Problem.

Your Team Has AI Licences. You Don't Have an AI System.

Your $2,000 Day Starts the Night Before: Our System Keeps You on the Tools, Not on the Phone

The Fastest Way for an Executive to Get Across AI

Your IT Department Will Take 18 Months. You Need This Working by Next Quarter.

What If You Had Perfect Memory Across Every Client?

We Built an AI Invoice Verifier. Here's Where It Hits a Wall.

Why AI Safety Features Are Load-Bearing Architecture, Not Political Decoration

Woolworths' AI Told a Customer It Had a Mother. That's a Problem.

Google Is No Longer the Only Way Your Customers Find You

Two Types of AI Assessment: And How to Know Which One You Need

The Personal Workflow Analysis: What Watching a Real Workday Reveals About Automation

What a Good AI Audit Actually Delivers

Your Website Looked Great Five Years Ago. Now It's Costing You Customers.

AI Audit That Starts With Your Business

What Production AI Teaches You That Demos Never Will

The Psychology of Why Your Team Won't Use AI

Stop Telling AI What NOT to Do: The Positive Framing Revolution

How We Turned Generic AI Into a Specialist: And What That Means for Your Business

Your Business Has 9 Customer Touchpoints. AI Can Fix the 6 You're Dropping.

What Happens to Your Data When You Press 'Send' on an AI Tool