Glossary
Inference
The act of running a trained model to get an answer; in other words, using the AI rather than training it.
Inference is what happens every time you actually use a model: you send an input, the model computes a response. Training is the one-time, expensive work of building the model; inference is the ongoing cost of running it.
Because you pay for inference on every request, usually by the token, it is where the real operating cost of an AI feature lives. Speed and price here shape what is affordable to ship and run at scale.
How we use it
We design systems with inference cost and latency in mind, choosing models and patterns that hit your quality bar without an unpredictable bill.
Related terms

Get in touch
Want to put this into practice?
If this concept is relevant to something you're building, a short note is the fastest way to get practical help.
