LLMs & the Art of Prompt Engineering

A large language model can draft an essay, debug code and explain quantum tunnelling — yet underneath it is doing one thing over and over: guessing the next chunk of text. Understanding that single mechanism is what turns prompt writing from guesswork into a skill.

This guide walks from the raw mechanism — next-token prediction — up through tokens, attention and training, and lands on the practical craft of prompt engineering: the patterns that reliably get better answers out of any modern model.

01A model that predicts the next token

Give a language model the text "The capital of Estonia is" and it does not "look up" Tallinn. Instead it produces a probability distribution over its entire vocabulary for what comes next, and Tallinn simply happens to score highest. Generate one token, append it, and repeat — that loop is the whole engine of text generation.

Figure 1 — Next-token probabilities

Generation is a loop. Each step produces probabilities over the vocabulary; sampling settings like temperature decide how adventurous the pick is.

02Tokens, not words

Models do not read words or letters — they read tokens, sub-word fragments produced by a tokenizer. Common words are usually a single token; rarer words split into pieces. This is why models sometimes miscount letters or stumble on unusual names: they never saw the letters, only the chunks.

Figure 2 — How text becomes tokens

Each token maps to a high-dimensional embedding — a vector that encodes meaning, so similar tokens sit near each other in space.

03Attention: weighing the context

The breakthrough behind modern models is the transformer, and its core trick is self-attention. For every token, attention asks: which other tokens in the context matter for understanding this one? It then blends information from those tokens, weighted by relevance. This is how a model resolves "it" to the right noun, or keeps track of who did what across a long paragraph.

Figure 3 — Attention links a word to its context

Attention runs in parallel across all tokens and is repeated in dozens of layers, building rich, contextual meaning.

"A prompt is not a magic spell — it is the context the model attends to. Better context, better output."

04How a model is trained

Modern assistants are built in three stages. Pre-training exposes the model to vast text so it learns language, facts and reasoning patterns by predicting the next token. Fine-tuning on curated instruction–response pairs teaches it to be helpful and follow tasks. Finally, RLHF (reinforcement learning from human feedback) aligns its tone and safety with human preferences.

Figure 4 — The training pipeline

You interact with the model after all three stages — which is why it behaves like a helpful assistant, not an autocomplete box.

05Prompt engineering, demystified

Because output quality depends on the context the model attends to, how you ask is half the work. A strong prompt usually has four parts: a role, a clear task, the context or data, and the output format you want.

Anatomy of a strong prompt

Role — "You are an experienced EU project evaluator."
Task — "Review this project summary for clarity and impact."
Context — paste the summary, audience, and constraints.
Format — "Reply as 3 bullet strengths and 3 fixes."

06Patterns that reliably work

A handful of techniques consistently lift results. Few-shot prompting gives the model two or three worked examples before the real task. Chain-of-thought ("think step by step") improves reasoning on multi-step problems. Explicit constraints ("under 100 words, no jargon") prevent rambling. And iteration — refining the prompt after seeing the first answer — beats trying to write the perfect prompt in one shot.

2–3xFew-shot examples sharply improve format adherence

Step→Chain-of-thought boosts multi-step reasoning

100wExplicit limits keep answers focused and usable

Key takeaways

What to remember

An LLM predicts the next token — generation is that loop, repeated.
Models read sub-word tokens, not words or letters.
Attention lets each token weigh the relevant context.
Assistants are pre-trained, fine-tuned, then aligned with RLHF.
Good prompts state role, task, context and output format.
Few-shot, chain-of-thought, constraints and iteration reliably help.

RRINOVA Research Team

We translate advanced technology and EU policy into practical training. This explainer is part of our open Insights series for educators, youth workers and SMEs.

01A model that predicts the next token

02Tokens, not words

03Attention: weighing the context

04How a model is trained

05Prompt engineering, demystified

Anatomy of a strong prompt

06Patterns that reliably work

Key takeaways

What to remember

Keep exploring

How neural networks actually work

Transformers: the engine of modern AI

What is an Erasmus+ KA2 partnership?

Train your team on real AI skills