Embeddings & Vector Search

Embeddings are the quiet foundation under search engines, recommendations, semantic search and RAG. The idea is one of the most beautiful in modern AI: represent meaning as a position in space.

01Meaning as coordinates

Imagine placing every word in a vast multi-dimensional space so that related words sit near each other — "Paris" close to "France", "happy" close to "joyful". Distance now means something: nearby points are similar, distant points are not. That arrangement of coordinates is an embedding.

02What an embedding is

Concretely, an embedding is a list of numbers — a vector — often a few hundred to a few thousand long. A model learns to assign these vectors so that the geometry captures meaning. The same trick works for words, whole sentences, images, products and users.

Figure 1 — One item, many dimensions

A single embedding is a vector — here just five of its hundreds of dimensions. No single number is meaningful alone; the pattern across all of them encodes the meaning.

03The shape of the space

What makes embeddings remarkable is that directions carry meaning too. In well-trained word embeddings the step from "man" to "woman" is roughly the same direction as "king" to "queen". Relationships become arithmetic. The model was never taught this rule — it emerged from the data, because it is the most efficient way to predict language.

Embeddings do not store definitions. They store relationships — and relationships, it turns out, are most of what meaning is.

04Vector search

Once everything is a vector, finding similar items becomes a geometry problem: which stored vectors are closest to this one? This is vector search, and specialised databases can do it across billions of items in milliseconds, usually by measuring cosine similarity — the angle between vectors.

Figure 2 — From query to nearest neighbours

The same four steps power semantic search, "related products" and the retrieval stage of RAG. Change the data, not the algorithm.

05Where they show up

Embeddings are everywhere once you look. Semantic search that understands intent, "customers also bought" recommendations, duplicate-question detection, clustering survey responses by theme, and the retrieval step of every RAG system all run on the same foundation. Learn embeddings once and a dozen products suddenly make sense.

06Limits & bias

Because embeddings learn from human text, they absorb human bias: associations between professions and genders, or names and sentiment, can be baked into the geometry. They also have no notion of truth — only of similarity. Two false statements that read alike will sit close together. Embeddings organise meaning; they do not verify it.

What to remember

An embedding represents meaning as a position in a high-dimensional space.
Distance encodes similarity; nearby vectors mean similar things.
Directions can encode relationships — meaning becomes arithmetic.
Vector search finds nearest neighbours fast, usually via cosine similarity.
They power semantic search, recommendation, clustering and RAG.
They inherit bias from training data and capture similarity, not truth.

RRINOVA Research Team

We translate advanced technology and EU policy into practical training. This explainer is part of our open Insights series for educators, youth workers and SMEs.

01Meaning as coordinates

02What an embedding is

03The shape of the space

04Vector search

05Where they show up

06Limits & bias

What to remember

Keep exploring

Retrieval-augmented generation, explained

LLMs & the art of prompt engineering

How neural networks actually work

Understand the AI under the hood