Engineering
Code Review in the Age of AI: What Humans Still Have to Catch
By Niall · 6 min read
AI clears the small stuff fast, but intent, security context and architecture fit are still yours to catch.
AI has become genuinely useful at reviewing code. It spots typos, flags obvious bugs, suggests cleaner phrasings, and catches the small stuff faster than any human reading a long diff at the end of the day. It is a real upgrade to the first pass of a review. The risk is mistaking that first pass for the whole job.
Because the more code AI writes, the more important review becomes, and the parts of review that matter most are exactly the parts AI is weakest at. Knowing what a machine can and cannot catch is now a core engineering skill, not a nice-to-have.
What AI review is genuinely good at
Let us give AI its due. It is fast, tireless and consistent on the mechanical layer of review: style, formatting, simple bugs, unused variables, obvious null-handling gaps, and the common patterns it has seen countless times. It never gets bored on the five-hundredth line of a diff, and it surfaces the small issues that humans skim past when they are tired. As a first pass, it is excellent.
There is real value in handing this layer to a machine. It frees human reviewers from the tedious work of policing formatting and spotting trivial slips, which is exactly the work people are worst at doing reliably late in the day. Let the tool clear the noise, and the human review that follows can be sharper and more focused.
Intent: what was this even meant to do?
Here is the first thing AI struggles with. A review is not only about whether the code is correct in isolation; it is about whether it does the right thing. AI does not know what the change was supposed to achieve, what the ticket really meant, or what the customer actually needs. Code can be flawless and still solve the wrong problem, and only someone who understands the intent will notice.
This is why a clear description of what a change is meant to do is worth so much at review time. Without it, a reviewer, human or otherwise, is left guessing whether the code matches the goal. AI can tell you the code is internally consistent; it cannot tell you the goal was right, because it was never told what the goal truly was.
Security in context
AI can catch textbook security mistakes, but security is usually about context, and context is where it falls short. Whether a given endpoint needs an authorisation check depends on who is allowed to call it and what it exposes, which lives in your system's design, not in the diff. The dangerous vulnerabilities are rarely the obvious ones; they are the ones that only make sense when you understand how the whole system fits together.
Real security review also weighs trust boundaries, data sensitivity and how pieces combine, none of which is visible in a single diff. A change that looks harmless on its own can open a hole when it meets the rest of the system. Judging that takes a mental model of how data flows through your application, which is exactly what a model reviewing one file does not have.
Architecture fit
A change can be perfectly correct on its own and still be wrong for your codebase: duplicating something that already exists, introducing a pattern that conflicts with how everything else works, or adding coupling that will hurt you later. AI reviews the diff in front of it; it does not carry a mental model of your whole architecture and where this change should sit within it. Keeping the system coherent is still a human responsibility.
Left unchecked, this is how codebases quietly rot. Each individual change passes review, yet the whole drifts into something inconsistent and hard to work in. Someone has to be accountable for coherence across changes, asking not just whether a change is correct, but whether it belongs here and makes the system better rather than merely bigger.
The edge cases that actually matter
AI is good at flagging generic edge cases. It is far weaker on the ones specific to your domain: the customer type that breaks an assumption, the regulatory rule, the legacy quirk that everyone on the team simply knows. Those are often the cases that cause real incidents, and catching them takes someone who knows the business, not just the language the code is written in.
These cases are also the ones least likely to appear in any training data, because they are specific to you. The customer who behaves unusually, the integration that misbehaves once a quarter, the rule that only applies in one region: these are learned from living with a system, not from reading code in general. That hard-won context is exactly what human review brings to the table.
Speed makes review matter more, not less
It is tempting to think that faster code generation means you can relax review. The opposite is true. When a model can produce a large change in seconds, the volume of code flowing toward production rises, and the share of it that no human has truly thought about rises with it. Review is the valve that keeps that flow safe, and it earns its place more the faster you generate.
Use both, and keep your standards
The answer is not to choose between AI and human review. It is to layer them: let AI handle the fast mechanical pass so humans can spend their attention on intent, security context, architecture and the edge cases that matter. And keep your standards explicit, written down, agreed and enforced, so that both the humans and the AI are reviewing against the same bar.
- Let AI take the first pass on style, simple bugs and obvious gaps.
- Reserve human review for intent, security context, architecture and domain edge cases.
- Keep coding standards explicit so reviews are consistent, not personal.
- Treat AI suggestions as input to judge, never as approval to ship.
Used together, AI and human review make each other better: the machine clears the noise, and people focus on what only people can catch. Helping teams build that kind of disciplined, AI-aware review process is part of how we keep fast-moving codebases healthy.
Relevant services


