Engineering

A CTO's Guide to AI Security: Prompt Injection, Data Leaks and Guardrails

By Niall · 8 min read

Engineering

LLM apps fail in security ways traditional apps do not. A CTO's practical guide to prompt injection, data leaks, and the guardrails that hold.

If you are responsible for shipping software, AI applications hand you a category of security problems your existing instincts do not fully cover. The familiar risks, injection, access control, data handling, are all still there, but a language model introduces new ways for each to go wrong, and a few that are genuinely novel. The good news is that the defenses are understandable. The bad news is that few teams put them in place before something forces the issue.

This is a practical tour of the risks that matter most in LLM applications and the guardrails that actually help. It is written for the person who has to sign off that the thing is safe to ship.

Prompt injection: the risk you cannot fully patch

Prompt injection is the signature vulnerability of LLM apps. Because the model treats instructions and data as the same stream of text, content it reads, a web page, an email, a document, a user message, can carry hidden instructions that hijack its behaviour. An attacker does not need to breach your servers; they just need their text to reach your model. It is closer to social engineering than to a classic exploit, and there is no single patch that makes it disappear. You manage it; you do not solve it once and forget it. We go deeper on the specifics in our piece on defending against prompt injection.

Data leaks: where your information escapes

The second big risk is information going where it should not. A model with access to sensitive data can be coaxed into revealing it, repeat it back to the wrong user, or fold it into an answer that gets logged somewhere insecure. The exposure grows the moment you connect a model to your real systems, which is exactly when it becomes useful. The uncomfortable truth is that the more capable and connected you make an AI feature, the larger its potential blast radius if you have not contained it.

The defenses that actually help

Treat all model input as untrusted, including content it fetches or reads, and never let raw model output trigger a sensitive action unchecked.
Give agents least privilege: scoped, minimal permissions, so a hijacked agent can do very little damage.
Validate and constrain outputs before they act, rather than trusting the model to behave.
Keep a human in the loop for high-stakes actions like payments, deletions or anything irreversible.
Log and monitor what the model and its tools actually did, so you can see and stop abuse quickly.

The single most useful mental shift is this: treat the model as a useful but untrusted component, never as a trusted part of your system. Architect as if it can be manipulated, because it can.

Defense in depth, not a magic filter

There is no single setting that makes an AI application secure, and any vendor implying otherwise should worry you. Security here is layered: limit what the model can reach, constrain what it can do, validate what it produces, and watch what actually happens at runtime. Each layer is imperfect on its own, and together they turn a wide-open risk into a managed one. This is ordinary defense-in-depth thinking, applied to a component that happens to be probabilistic and easily talked into things.

AI security is not a reason to avoid building, it is a reason to build deliberately. The teams that get burned are usually the ones who treated the model as trusted and wired it into everything before thinking about containment. Designing those guardrails before they are tested in anger is exactly the kind of senior engineering review we are brought in to do, and it is a great deal cheaper than the alternative.

Relevant services