A large language model can draft an essay, debug code and explain quantum tunnelling — yet underneath it is doing one thing over and over: guessing the next chunk of text. Understanding that single mechanism is what turns prompt writing from guesswork into a skill.
This guide walks from the raw mechanism — next-token prediction — up through tokens, attention and training, and lands on the practical craft of prompt engineering: the patterns that reliably get better answers out of any modern model.
01A model that predicts the next token
Give a language model the text "The capital of Estonia is" and it does not "look up" Tallinn. Instead it produces a probability distribution over its entire vocabulary for what comes next, and Tallinn simply happens to score highest. Generate one token, append it, and repeat — that loop is the whole engine of text generation.
Generation is a loop. Each step produces probabilities over the vocabulary; sampling settings like temperature decide how adventurous the pick is.
02Tokens, not words
Models do not read words or letters — they read tokens, sub-word fragments produced by a tokenizer. Common words are usually a single token; rarer words split into pieces. This is why models sometimes miscount letters or stumble on unusual names: they never saw the letters, only the chunks.
Each token maps to a high-dimensional embedding — a vector that encodes meaning, so similar tokens sit near each other in space.
03Attention: weighing the context
The breakthrough behind modern models is the transformer, and its core trick is self-attention. For every token, attention asks: which other tokens in the context matter for understanding this one? It then blends information from those tokens, weighted by relevance. This is how a model resolves "it" to the right noun, or keeps track of who did what across a long paragraph.
Attention runs in parallel across all tokens and is repeated in dozens of layers, building rich, contextual meaning.
04How a model is trained
Modern assistants are built in three stages. Pre-training exposes the model to vast text so it learns language, facts and reasoning patterns by predicting the next token. Fine-tuning on curated instruction–response pairs teaches it to be helpful and follow tasks. Finally, RLHF (reinforcement learning from human feedback) aligns its tone and safety with human preferences.
You interact with the model after all three stages — which is why it behaves like a helpful assistant, not an autocomplete box.
05Prompt engineering, demystified
Because output quality depends on the context the model attends to, how you ask is half the work. A strong prompt usually has four parts: a role, a clear task, the context or data, and the output format you want.
Anatomy of a strong prompt
- Role — "You are an experienced EU project evaluator."
- Task — "Review this project summary for clarity and impact."
- Context — paste the summary, audience, and constraints.
- Format — "Reply as 3 bullet strengths and 3 fixes."
06Patterns that reliably work
A handful of techniques consistently lift results. Few-shot prompting gives the model two or three worked examples before the real task. Chain-of-thought ("think step by step") improves reasoning on multi-step problems. Explicit constraints ("under 100 words, no jargon") prevent rambling. And iteration — refining the prompt after seeing the first answer — beats trying to write the perfect prompt in one shot.
Key takeaways
What to remember
- An LLM predicts the next token — generation is that loop, repeated.
- Models read sub-word tokens, not words or letters.
- Attention lets each token weigh the relevant context.
- Assistants are pre-trained, fine-tuned, then aligned with RLHF.
- Good prompts state role, task, context and output format.
- Few-shot, chain-of-thought, constraints and iteration reliably help.
