Your Codebase Was Not Built for AI. That's the Actual Problem.
Amazon's mandatory meeting about AI breaking production isn't an AI tools story. It's an architecture story. The codebases AI is being pointed at were never designed to be understood by anything other than the humans who built them.
This week, Amazon summoned its e-commerce engineers to a mandatory meeting to discuss a pattern of production outages caused by AI-assisted code changes. The internal briefing note described incidents with “high blast radius” and identified “Gen-AI assisted changes” as a contributing factor, noting that “best practices and safeguards are not yet fully established.” The response: junior and mid-level engineers now need a senior engineer to sign off on any AI-assisted changes to production.
The takes arrived on schedule. AI is overhyped. Vibe coding is reckless. We need more guardrails. Slow down adoption. Hire the humans back.
Every one of those reactions is addressing a symptom. The disease is elsewhere.
The Fix That Tells You Everything
One incident stands out. An AWS engineer tasked Amazon’s Kiro AI coding tool with fixing something in a production environment. The AI assessed the situation and determined the most efficient path to the desired state: delete the entire environment and recreate it from scratch. The software equivalent (as one observer put it) of fixing a leaky tap by knocking down the wall.
The recovery took thirteen hours. Amazon attributed the incident to “misconfigured access controls.” User error, not AI error. And technically, they are right. The permissions should not have allowed the action. But the diagnosis misses the more important question: why did the AI choose that path in the first place?
The answer is straightforward. The AI did not understand the system it was modifying. It could see the environment. It could see the desired state. It chose the shortest path between the two. In the absence of understanding (knowing why the environment was structured the way it was, what depended on it, what would break), deletion and recreation was a perfectly logical solution. Efficient, even. If you have no concept of what you are destroying, destruction is just a faster form of construction.
This is not a bug in the AI. It is the predictable behaviour of any system that can act on code it cannot fully comprehend.
The Bandwidth Illusion
The instinctive response to this problem is to point at context windows. Models are getting bigger. Context windows now stretch to 200,000 tokens, 400,000, a million, even ten million for some open-source models. Surely, if we can fit the entire codebase into the context window, the AI will understand it.
This belief is wrong in a way that matters.
Context window size is a measure of how much text an AI model can see at once. It is not a measure of how much it can understand. Research consistently demonstrates that model performance degrades well before the advertised context limit is reached. Information buried in the middle of long contexts gets lost; a phenomenon researchers call “lost in the middle.” A model with a million-token context window does not have a million tokens of comprehension. It has a million tokens of input and a degrading curve of attention that makes the 500,000th token significantly less useful than the 5,000th.
But even if context windows were perfect (even if a model could attend to every token with equal fidelity), the fundamental problem would remain. Most codebases are not structured as information that can be consumed in a single pass.
The knowledge that makes a codebase comprehensible to a human developer is not in the code. It is distributed across hundreds of files, implicit in naming conventions, buried in commit history, scattered across documentation systems nobody reads, and carried in the heads of the people who built it. A senior developer who has worked on a system for three years does not understand it because they have read every file. They understand it because they have absorbed thousands of micro-decisions through months of standups, code reviews, Slack conversations, post-mortems, and the slow accumulation of context that comes from watching a system evolve.
AI gets none of that. It gets what fits in the window. Everything else (the institutional knowledge, the architectural rationale, the “we tried that in 2023 and it broke the billing system”) it fills in with the most plausible-sounding continuation. Which is to say: it invents it. And the invented version looks exactly like the real version, until it doesn’t.
That is the “high blast radius” Amazon is experiencing. The AI’s changes are locally correct. The code it writes works in isolation. It passes the tests that exist. But it does not understand the system it is modifying, because the system exceeds what any model can hold: not in tokens, but in the kind of knowledge that makes a system comprehensible. The fix works on the file it touched and breaks something three services away, in a dependency the AI never saw because nobody documented it, because the human who knew about it left the company eighteen months ago.
Why Existing Codebases Are Hostile to AI
Every codebase older than a year carries a layer of implicit knowledge that no model can reconstruct from the source files alone. This is not a criticism of the people who built those systems. It is a description of how software has always been built: by humans, for humans, with the reasonable assumption that future maintainers would be human beings who could ask questions, read between the lines, and develop intuition over time.
That assumption no longer holds. AI coding tools are now maintaining, extending, and debugging these systems. And the systems were never designed for it.
Consider what a human developer does when they join a team and encounter a complex codebase for the first time. They do not read every file. They ask: “Why is this service structured this way?” They attend standups and hear about the migration that happened last quarter. They submit a pull request and get feedback saying “don’t touch that module, it has a hidden dependency on the payment service.” They build a mental model through interaction, not ingestion.
An AI coding tool gets the files. Sometimes it gets documentation. Rarely does it get the reasoning behind the architecture. Never does it get the institutional memory explaining why a seemingly redundant service exists or why a particular endpoint handles errors in a way that looks wrong but actually compensates for a bug in a third-party integration that was never fixed.
The result is exactly what Amazon is seeing. The AI makes changes that are syntactically correct, locally functional, and systemically dangerous. Not because the AI is stupid, but because the AI is operating on incomplete information and has no mechanism for knowing what it does not know. It cannot ask “why is this structured this way?” It can only observe the structure and extrapolate.
This connects directly to a point we made in our previous analysis of AI safety architecture. The most dangerous AI system is not the one that refuses to act. It is the one that acts confidently on incomplete information without signalling that its understanding is partial. A well-calibrated model says “I am not confident about this change and recommend human review.” A poorly calibrated one (or one operating on a codebase that gives it no basis for calibrating its confidence) makes the change with the same assurance it brings to every other change. The operator has no warning. The pull request looks identical to the hundreds of safe ones that preceded it.
Building for AI Means Building for Legibility
The organisations that will use AI coding tools effectively are not the ones with the best prompting strategies. They are the ones whose architecture is inherently comprehensible within model constraints.
This is a design problem, not a tooling problem. And it has a clear solution, even if the solution requires rethinking how software is structured.
The principle is simple: every component should be small enough to fit in a context window with room to spare, self-contained enough that the AI does not need to understand the entire system to work on it safely, and connected to other components through interfaces that are explicit and documented.
Think of it as the difference between a cathedral and a set of Lego bricks.
A cathedral is a single, interconnected structure where every stone depends on every other stone. Moving one element risks the whole. Understanding any part requires understanding the whole. This is what most production codebases look like: tightly coupled, deeply interdependent, and comprehensible only to the people who built them.
A set of Lego bricks is modular. Each brick has a defined shape, defined connection points, and can be assembled without understanding the full structure. You can hand someone a single brick and say “build this” and they can do it without knowing what the final model looks like. The connections are obvious. The constraints are physical.
Building for AI means building Lego, not cathedrals. Independent modules with clear interfaces. Each module small enough to fit in a context window. Dependencies declared, not implied. Architectural decisions documented alongside the code they govern, not buried in a Confluence page three clicks away that was last updated in 2024.
This is not a new idea. Good software architecture has always favoured modularity, separation of concerns, and explicit interfaces. What AI does is raise the stakes of not doing it. A human developer working in a tangled codebase is slow and frustrated. An AI working in a tangled codebase is fast and wrong. The mess that was merely inefficient for humans becomes actively dangerous when AI operates at machine speed with machine confidence.
AI Did Not Eliminate Project Management. It Made It Load-Bearing.
There is a seductive narrative in the AI coding space: the spec is dead. Just tell the AI what you want and it builds it. Requirements gathering, architecture documents, acceptance criteria. All are relics of a slower era that AI has made obsolete.
This narrative has it precisely backwards. AI has not eliminated the need for clear specifications. It has made specifications the single most important input in the development process.
A human developer given vague requirements will do one of three things: ask clarifying questions, make reasonable assumptions based on experience, or build the wrong thing and explain why the requirements were insufficient. All three outcomes involve a feedback loop. The human recognises ambiguity and responds to it.
An AI given vague requirements builds something. It builds it fast. It builds it confidently. It does not recognise the ambiguity because it has no mechanism for distinguishing between a well-specified task and a poorly-specified one. It fills the gaps the same way it fills every gap: with the most plausible continuation. The output looks professional. It passes a cursory review. It ships. And the gap between what was specified and what was needed reveals itself in production, where the cost is measured in outages, not in iterations.
The organisations blaming AI for production failures are, in many cases, blaming the wrong layer. The AI did what it was told. The problem is that what it was told was incomplete, ambiguous, and disconnected from the architectural context that would have made a good outcome possible.
Documentation is not bureaucracy. It is the input layer. And for AI, the quality of the input determines the quality of the output with far less tolerance for ambiguity than a human developer would require. The spec, the architecture decision record, the acceptance criteria: these are not overhead. They are the mechanism by which AI produces reliable work instead of confident fiction.
The Proof of Concept as Documentation
Here is the practical insight that separates this from every other “AI needs better specs” argument.
The documentation does not have to be traditional. It does not have to be a Word document, a Jira ticket, or an architecture diagram. It can be code.
A single-file proof of concept (a server.js or an index.py that demonstrates the hardest integration points, handles the most complex edge cases, and proves the core architecture works) is both a specification and a test. It is unambiguous because it runs. It fits in a context window because it is one file. It tells the AI exactly how the system should behave: not in natural language that can be interpreted six different ways, but in executable logic that either works or does not.
Build the hard parts first. Prove the most complex integration in a single file. Get it working. Then hand that file to an AI and say “extend this.” The AI now has a concrete reference implementation that answers every architectural question it would otherwise have to guess at. What format does the API return? Look at the code. How should errors be handled? Look at the code. What is the relationship between this service and that one? It is demonstrated, not described.
This inverts the traditional development workflow. Instead of writing documentation that describes what the system should do and then building it, you build the minimum viable proof that demonstrates what the system does, and that proof becomes the documentation. The AI does not need to interpret a specification. It needs to extend a working system. And extending working code is something AI is genuinely good at; far better than interpreting ambiguous requirements and building from scratch.
The proof of concept also solves the modularity problem. If the proof is small enough to fit in a context window, every module the AI builds from that proof is also small enough. The constraint propagates. The architecture is legible by construction, not by discipline.
This is why every engagement we run starts with a working proof of concept, not a proposal deck. The POC proves the architecture before the investment scales. It becomes the reference implementation that AI and your team can extend with confidence.
The Bottom Line
Amazon’s mandatory meeting is not an AI tools story. It is an architecture story.
The codebases AI is being pointed at were never designed to be understood by anything other than the humans who built them. They carry implicit knowledge that no context window can reconstruct. They have dependencies that are known through experience, not documentation. They are comprehensible to a senior developer who has worked on them for years and opaque to everything else, including AI that can process a million tokens but cannot ask “why?”
The response to this (mandatory senior review of AI-assisted changes) is a necessary guardrail. But it is a guardrail that treats the symptom. The structural answer is to build systems that AI can actually understand. Modular. Documented. Small enough to fit in a context window with room to spare. Connected through explicit interfaces, not implicit knowledge. Proven through working code, not described in documents that diverge from reality the day they are written.
This is not a concession to AI’s limitations. It is good architecture. It always was. Modularity, separation of concerns, explicit interfaces, documentation that lives alongside the code: these have been best practice for decades. What AI does is enforce the standard. The codebases that ignored these principles and got away with it because talented humans compensated for the mess can no longer get away with it. The AI does not compensate. It extrapolates. And when it extrapolates from a mess, the result is a faster, more confident mess with a longer recovery time.
The organisations that will deploy AI coding tools successfully are not the ones with the most sophisticated prompting. They are the ones that recognised, before the outage, that architecture is the input layer and that building for AI means building for clarity.
Perth AI Consulting builds AI systems for organisations where reliability is not optional: architectured for the way AI actually works, not the way marketing describes it. Start with a conversation.