How to Build an AI Chatbot That Doesn't Lie to Your Customers
Woolworths deliberately scripted its AI to talk about its mother. The business fix is simple: be honest about the bot. The technical fix is harder: architecture that prevents fabrication by design, not by hope.
We recently wrote about what went wrong when Woolworths’ AI assistant Olive started telling customers about its mother and uncle: behaviour that Woolworths later confirmed was deliberately scripted. The business fix is straightforward: tell customers it is AI, and give them an easy path to a human.
But there is a technical story too. Whether the fiction comes from scripting or from the model itself, the underlying problem is the same: a system with no architectural guardrails against fabrication. Scripted or generated, the confident delivery of fiction as fact is a predictable outcome of how the system was built.
If you are building or buying a customer-facing AI tool, the architecture determines whether it helps your customers or embarrasses your business. Here is what building production AI systems has taught us, and what we would do differently if we were building Olive.
Every Model Has a Fingerprint
The first thing production AI teaches you is that models are not interchangeable. A new model does not mean a better model; it means a different model.
Every model has its own fingerprint. Its own tendencies, strengths, blind spots, and failure modes. A system tuned for one model: the prompts, the temperature settings, the guardrails; can produce dramatically worse output on a newer model that benchmarks higher on every public test.
AI is non-deterministic. The same prompt, the same model, the same settings can produce different output every time. This is a feature when you want natural-sounding conversation. It is a serious problem when you need factual accuracy in front of customers.
When Woolworths upgraded Olive to a newer model for “more natural voice conversations,” the naturalness came with a cost. A model optimised for fluid conversation is also optimised for continuing the conversational pattern, and when a customer asks “are you a real person?”, the conversational pattern is to say yes.
The lesson is not to avoid new models. It is to test them against the specific tasks your system performs, not against generic benchmarks. A model that scores higher on reasoning tests can still hallucinate more confidently in your particular use case. The gap between a demo and production is where these differences surface.
Separate the Facts From the Conversation
The most common architectural mistake in customer-facing AI is asking the model to do everything in one step: understand the question, find the answer, and deliver it conversationally, all in a single pass.
This is how you get an AI that invents a mother.
The fix is to separate retrieval from generation. Two passes, not one.
Pass one: find the facts. Low temperature. Structured output. The AI searches your knowledge base: your FAQs, product information, policies, service descriptions; and retrieves only what is relevant to the question. No creativity. No conversation. Just information retrieval with high accuracy.
Pass two: deliver the response. Moderate temperature. Conversational tone. The AI takes the retrieved facts and presents them naturally. It can only work with what pass one found. If pass one found nothing relevant, pass two says “I don’t have that information” instead of inventing an answer.
This separation is why temperature matters. Temperature controls how much creative latitude the model takes. High temperature produces more natural, varied conversation, and more hallucination. Low temperature produces more accurate, predictable responses, and more robotic delivery.
You do not want the same temperature for “find the relevant policy” and “explain it to a customer in plain English.” The principle applies to any AI system: analyse first, generate second. Separate the thinking from the talking.
Train It on What You Actually Know
Olive’s problem was not just that it fabricated a personal history. It was that it had nothing to fall back on when the conversation went sideways.
A well-architected customer chatbot is trained on a specific knowledge base: your website content, your FAQs, your product documentation, your service descriptions, your blog posts. When a customer asks a question, the AI draws from that knowledge base. When the question falls outside it, the AI says so.
This is not a limitation. It is a feature. A chatbot that only answers from verified information is dramatically more trustworthy than one that tries to answer everything.
The knowledge base also solves the consistency problem. Every customer gets the same accurate information because the AI is drawing from a single, maintained source of truth, not generating answers from its training data, which may be outdated, incorrect, or entirely irrelevant to your business.
Building the knowledge base is not a massive undertaking. If you have a website with service descriptions, a FAQ page, and a few blog posts, you already have the foundation. The AI does not need to know everything; it needs to know your business, accurately, and know when to stop.
Tell the AI What to Do, Not What to Avoid
When businesses brief their AI systems, the instinct is to write a list of prohibitions. Do not make things up. Do not pretend to be human. Do not discuss competitors. Do not give medical advice.
This approach feels thorough. It is also the least effective way to instruct AI.
Negative constraints trigger cautious, hedging behaviour, or they get ignored entirely when the conversational context is strong enough. “Do not pretend to be human” is a weak instruction when a customer is directly asking “are you a real person?” and the conversational pattern favours saying yes.
Positive framing works better:
- “You are an AI assistant for [business name],” not “do not pretend to be human”
- “Answer only using information from the provided knowledge base”; not “do not make things up”
- “When you cannot find a relevant answer, say: I don’t have that information, but I can connect you to our team”; not “do not guess”
The AI has a clear identity, a clear scope, and a clear fallback. There is no ambiguity to exploit and no gap where hallucination can creep in.
When It Does Not Know, It Should Say So
The final architectural requirement, and the one most businesses skip, is designing for graceful failure.
Every customer-facing AI will encounter questions it cannot answer. The question is whether it admits that or fills the gap with fiction.
Olive filled the gap. When confronted with a question about its own nature: something not in its training data for customer service; it generated the most plausible-sounding response it could. That response happened to include a fictional mother with an angry voice.
A well-designed system has explicit fallback behaviour:
- Questions outside the knowledge base get a clear “I don’t have that information” response
- Ambiguous questions get a clarifying question back, not a guess
- Sensitive topics route immediately to a human
- The AI never generates claims about itself, its feelings, or its experiences
These are not safety features bolted on after launch. They are architectural decisions made before the first line of code. The difference between an AI that embarrasses your business and one that earns trust is almost entirely in how it handles the moments when it does not know the answer.
The Bottom Line
Olive’s failure was not a mystery. It was a predictable outcome of architecture that prioritised conversational fluency over factual accuracy, without the guardrails to manage the tradeoff.
Building a customer-facing AI that does not lie is not about finding a better model. It is about building a better system:
- Test models against your specific tasks, not benchmarks. Every model has a fingerprint.
- Separate facts from conversation. Two passes: retrieve accurately, then deliver naturally.
- Train it on your knowledge base. The AI answers from what you know, not what it imagines.
- Frame instructions positively. Tell it what it is, not what it should avoid.
- Design for failure. When it does not know, it says so; clearly, immediately, every time.
The business decisions matter too: honesty and human fallback are non-negotiable. But the architecture is what makes honesty possible at scale. A well-built system does not need to be told not to lie. It simply has no mechanism to do so.
Perth AI Consulting builds AI systems that your customers can trust: chatbots, automation, and tools architected for accuracy, not just fluency. Start with a conversation.