RAG
Vector Search 101: How AI Finds the Right Answer in Your Data
By Niall · 7 min read
Vector search is the quiet workhorse that lets AI find the passage you meant, even when you used different words.
Behind almost every AI assistant that answers from your own data sits a quiet workhorse: vector search. It is the technology that lets a system find the passage you meant even when you did not use its exact words. Understanding it in plain language helps you make better decisions about the assistants and chatbots you build, even if you never touch the maths.
Nothing here requires a background in linear algebra. The ideas are intuitive once you strip away the jargon, and they explain a lot about why some AI search feels uncannily good and some feels frustratingly off.
Embeddings: turning meaning into numbers
An embedding is a list of numbers that represents the meaning of a piece of text. Run a sentence through an embedding model and you get a vector, a point in a high-dimensional space, positioned so that things with similar meaning sit close together. 'How do I reset my password' and 'I forgot my login' land near each other, even though they share almost no words.
That is the whole trick. By turning text into points in space where nearness means similarity of meaning, you can search by what something means rather than the exact characters it contains. Keyword search asks 'does this word appear?'; vector search asks 'is this about the same thing?'.
Similarity search
Once your content is a collection of points, finding relevant material becomes a geometry problem. Embed the user's question, drop it into the same space, and look for the nearest points. Those nearest neighbours are the passages most similar in meaning, and they are what you hand to the model to answer from. This is similarity search, and it is the heart of vector retrieval.
In practice you ask for the top few matches, say the five or ten nearest, and let the model work from those. Tuning how many you retrieve is a balance: too few and you miss the answer, too many and you drown the model in noise and pay for tokens you did not need.
It is worth pausing on how powerful this is. A user can describe a problem in their own words, with none of your jargon, and still land on the right document. That tolerance for how people actually phrase things is exactly what makes vector search feel intelligent.
Vector databases
Searching a handful of vectors is easy; searching millions quickly is not, which is what vector databases are built for. They store embeddings and find nearest neighbours fast, using clever indexes so they do not have to compare against every point. You have a range of solid options depending on your stack and scale.
- pgvector with Supabase or plain Postgres: keep vectors in the database you already run, simple and pragmatic.
- Pinecone: a managed service that handles scale and operations for you.
- Qdrant: an open-source engine you can self-host or use as a managed service.
- Weaviate: an open-source database with built-in vector and hybrid search.
For many teams the pragmatic choice is to start with pgvector in the database they already run. It keeps your vectors next to the rest of your data, avoids a new piece of infrastructure to operate, and scales further than people expect. Reach for a dedicated vector database when scale or features genuinely demand it.
Hybrid search: the best of both worlds
Vector search is brilliant at meaning and surprisingly weak at exact strings, product codes, names, error numbers, where the precise characters are the point. Keyword search is the opposite. Hybrid search runs both and combines the results, so you get semantic understanding and exact-match precision together. For most real-world data, hybrid beats either approach alone.
Combining the two is not just averaging scores. A good hybrid setup decides how much weight to give each signal for your data, and that balance is worth tuning. The payoff is an assistant that understands a vague question and still nails the one where a customer pasted an exact order number.
Reranking: a second, sharper look
Fast similarity search casts a wide net, and the top results are good but not perfectly ordered. A reranker takes that shortlist and re-scores it with a more careful, more expensive model that looks at the question and each candidate together. You search broadly and cheaply, then rerank the finalists precisely, which lifts the truly relevant passage to the top where the model will actually use it.
Reranking adds cost and a little latency, so it is not always worth it. But when the right answer keeps landing just outside the passages you send to the model, a reranker is often the single most effective fix, far more so than swapping models or rewriting prompts.
Chunking: the strategy that quietly decides quality
How you split documents before embedding them shapes everything downstream. Chunks that are too big mix several topics and muddy relevance; chunks that are too small lose the context an answer needs. Splitting along natural boundaries, sections and paragraphs, with a little overlap, tends to work best. It is unglamorous, and it often matters more to search quality than which database you picked.
If your AI search is disappointing, chunking is one of the first places to look. Many retrieval problems that look like model failures are really chunking failures in disguise.
Vector search is the engine under grounded chatbots, document assistants and internal knowledge tools, and the pieces, embeddings, a vector database, hybrid search, reranking and good chunking, are understandable without a maths degree. Choosing and tuning them well is what makes AI feel like it genuinely understands your data. That tuning is part of every AI chatbot and retrieval system we build.
Relevant services

