Skip to content

Glossary

Inference

The act of running a trained model to get an answer; in other words, using the AI rather than training it.

Inference is what happens every time you actually use a model: you send an input, the model computes a response. Training is the one-time, expensive work of building the model; inference is the ongoing cost of running it.

Because you pay for inference on every request, usually by the token, it is where the real operating cost of an AI feature lives. Speed and price here shape what is affordable to ship and run at scale.

How we use it

We design systems with inference cost and latency in mind, choosing models and patterns that hit your quality bar without an unpredictable bill.

Charleston waterway at sunset with palmetto silhouettes

Get in touch

Want to put this into practice?

If this concept is relevant to something you're building, a short note is the fastest way to get practical help.