Embeddings are the quiet foundation under search engines, recommendations, semantic search and RAG. The idea is one of the most beautiful in modern AI: represent meaning as a position in space.
01Meaning as coordinates
Imagine placing every word in a vast multi-dimensional space so that related words sit near each other — "Paris" close to "France", "happy" close to "joyful". Distance now means something: nearby points are similar, distant points are not. That arrangement of coordinates is an embedding.
02What an embedding is
Concretely, an embedding is a list of numbers — a vector — often a few hundred to a few thousand long. A model learns to assign these vectors so that the geometry captures meaning. The same trick works for words, whole sentences, images, products and users.
A single embedding is a vector — here just five of its hundreds of dimensions. No single number is meaningful alone; the pattern across all of them encodes the meaning.
03The shape of the space
What makes embeddings remarkable is that directions carry meaning too. In well-trained word embeddings the step from "man" to "woman" is roughly the same direction as "king" to "queen". Relationships become arithmetic. The model was never taught this rule — it emerged from the data, because it is the most efficient way to predict language.
04Vector search
Once everything is a vector, finding similar items becomes a geometry problem: which stored vectors are closest to this one? This is vector search, and specialised databases can do it across billions of items in milliseconds, usually by measuring cosine similarity — the angle between vectors.
The same four steps power semantic search, "related products" and the retrieval stage of RAG. Change the data, not the algorithm.
05Where they show up
Embeddings are everywhere once you look. Semantic search that understands intent, "customers also bought" recommendations, duplicate-question detection, clustering survey responses by theme, and the retrieval step of every RAG system all run on the same foundation. Learn embeddings once and a dozen products suddenly make sense.
06Limits & bias
Because embeddings learn from human text, they absorb human bias: associations between professions and genders, or names and sentiment, can be baked into the geometry. They also have no notion of truth — only of similarity. Two false statements that read alike will sit close together. Embeddings organise meaning; they do not verify it.
What to remember
- An embedding represents meaning as a position in a high-dimensional space.
- Distance encodes similarity; nearby vectors mean similar things.
- Directions can encode relationships — meaning becomes arithmetic.
- Vector search finds nearest neighbours fast, usually via cosine similarity.
- They power semantic search, recommendation, clustering and RAG.
- They inherit bias from training data and capture similarity, not truth.
