AI Security

Prompt Injection Is the New SQL Injection: Defending Your AI Apps

By Niall · 7 min read

An abstract coastal lock and key resting on weathered dock pilings, representing defending AI apps from prompt injection

The old web learned never to trust user input; AI apps are relearning that lesson as prompt injection goes mainstream.

Anyone who built web apps in the 2000s remembers SQL injection: untrusted input slipping past a naive query and taking control of the database. We spent a decade learning to never trust user input. AI applications are now relearning that lesson in a new form, and the name that has stuck is prompt injection.

In 2026 prompt injection has moved from a research curiosity to a mainstream attack technique. If you are building anything where a model reads content from the outside world and can then take actions, this is a threat you need to design against from the start, not bolt on later.

What prompt injection actually is

Prompt injection is when malicious instructions are hidden inside content that a model reads, and the model follows them as if they came from you. The poisoned content can live in a web page, a PDF, an email, a support ticket, a product review, anywhere your system ingests text it did not write. The model cannot reliably tell the difference between your instructions and instructions buried in the data it is processing.

On its own, that might just produce a wrong answer. The danger sharpens when the model is an agent with tools or permissions: the ability to send email, call APIs, move money, change records, read private data. A successful injection can turn those capabilities against you, using your own agent as the weapon.

Why it is the new SQL injection

The parallel is exact. In both cases the root cause is the same mistake: mixing trusted instructions and untrusted data in the same channel, then letting the system act on the combination. We solved SQL injection with discipline, parameterised queries, validation and least privilege, not with a single magic fix. Prompt injection needs the same layered discipline, because there is no one setting that makes it disappear.

There is one important difference, and it is an uncomfortable one. With SQL injection, a correctly parameterised query is a near-complete fix. With prompt injection there is no equivalent guaranteed defence yet, because the model is designed to follow instructions written in natural language and cannot perfectly tell yours apart from an attacker's. That is exactly why the answer has to be layered: you are managing a risk, not closing a hole once and for all.

Defence in depth

There is no silver bullet, so you stack defences that each reduce the blast radius. The goal is that even if one layer is bypassed, the others contain the damage.

Treat all model output as untrusted, the same way you treat user input, and validate it before acting on it.
Give tools least privilege, so an agent can only do the narrow things its job actually requires.
Use allowlists for the actions, domains and recipients an agent is permitted to touch.
Require human approval for high-stakes or irreversible actions.
Validate inputs and outputs, and separate trusted instructions from untrusted data as cleanly as you can.
Monitor and log everything, so an attempted injection is visible rather than silent.

No single item on that list is sufficient, and that is the point. You are building overlapping layers so that the failure of any one of them is survivable. A team that has done four of these well is in far better shape than one still chasing a single perfect defence that does not exist.

Separating instructions from data

The single most important mindset shift is to stop treating everything the model sees as equally trustworthy. Your system prompt and your application's rules are trusted. A web page the agent just fetched is not. Keeping that boundary clear, structuring how untrusted content is presented to the model, and never letting fetched content silently become a new instruction, removes a whole class of attacks before they start.

A useful habit is to label provenance explicitly inside your system: this came from the user, this came from a trusted internal source, this was scraped from the open web. The model still has to be told to treat those differently, and you still validate what it does with them, but making the boundary visible is far better than pretending all text is equal and hoping for the best.

Least privilege is your strongest lever

If an agent cannot perform a dangerous action, an injection cannot make it perform that action. Scoping tools tightly is the most reliable defence you have, because it limits damage regardless of how clever the attack is. An agent that can read a calendar but not delete it, draft an email but not send it without approval, query a database but only through a constrained interface, is simply far less useful to an attacker.

This is also the defence that ages best. Models change, attacks get more creative, and clever prompt-level tricks come and go. A tightly scoped permission, by contrast, holds regardless of how the attack evolves. When you are deciding where to spend limited security effort, narrowing what the agent is able to do returns more safety per hour than almost anything else you could try.

Monitoring and assuming breach

Mature security assumes that some attempts will get through, and plans accordingly. Log every tool call and decision, alert on anomalies, and make it easy to see what an agent did and why. When something does slip past, the difference between a contained incident and a quiet disaster is whether you can see it happening in time to react.

Treat the first confirmed injection attempt as useful intelligence rather than a one-off nuisance. It tells you which inputs an attacker is probing, which guardrail caught it, and which one might not have. Feeding that back into your tests and allowlists is how a defence improves over time, instead of staying frozen at whatever you happened to imagine on launch day.

Assume any text your model reads might be trying to hijack it, and assume that one day something will succeed. Design so that when an injection lands, the worst it can do is small.

Defending against prompt injection is a software engineering problem as much as an AI one: validation, least privilege, monitoring and careful architecture. It is exactly the kind of hardening we build into the AI systems we ship, so that capable agents stay safe out in the real world.

Relevant services