HomeInsightsArtificial Intelligence
AI & Machine Learning

LLMs & the Art of Prompt Engineering

A large language model can write code and explain physics, yet underneath it only ever guesses the next chunk of text. Grasp that one mechanism and prompt writing turns from guesswork into a skill.

11 min read 5 May 2026 RRINOVA Research Team
Source code on a dark screen
Under the hood, an LLM is a next-token predictor trained on vast text. Photo: CC0.

A large language model can draft an essay, debug code and explain quantum tunnelling — yet underneath it is doing one thing over and over: guessing the next chunk of text. Understanding that single mechanism is what turns prompt writing from guesswork into a skill.

This guide walks from the raw mechanism — next-token prediction — up through tokens, attention and training, and lands on the practical craft of prompt engineering: the patterns that reliably get better answers out of any modern model.

01A model that predicts the next token

Give a language model the text "The capital of Estonia is" and it does not "look up" Tallinn. Instead it produces a probability distribution over its entire vocabulary for what comes next, and Tallinn simply happens to score highest. Generate one token, append it, and repeat — that loop is the whole engine of text generation.

Figure 1 — Next-token probabilities
Prompt: "The capital of Estonia is ___" Tallinn0.92 Tartu0.04 a0.02 located0.01 The model samples from this distribution, appends the token, and repeats.

Generation is a loop. Each step produces probabilities over the vocabulary; sampling settings like temperature decide how adventurous the pick is.

02Tokens, not words

Models do not read words or letters — they read tokens, sub-word fragments produced by a tokenizer. Common words are usually a single token; rarer words split into pieces. This is why models sometimes miscount letters or stumble on unusual names: they never saw the letters, only the chunks.

Figure 2 — How text becomes tokens
"Unbelievably sustainable" Un believ ably sus tain able 2 words → 6 tokens. The model predicts one of these units at a time.

Each token maps to a high-dimensional embedding — a vector that encodes meaning, so similar tokens sit near each other in space.

03Attention: weighing the context

The breakthrough behind modern models is the transformer, and its core trick is self-attention. For every token, attention asks: which other tokens in the context matter for understanding this one? It then blends information from those tokens, weighted by relevance. This is how a model resolves "it" to the right noun, or keeps track of who did what across a long paragraph.

Figure 3 — Attention links a word to its context
The robot picked up the ball because it was programmed "it" attends most strongly to "The robot" — thick line = high attention weight.

Attention runs in parallel across all tokens and is repeated in dozens of layers, building rich, contextual meaning.

"A prompt is not a magic spell — it is the context the model attends to. Better context, better output."

04How a model is trained

Modern assistants are built in three stages. Pre-training exposes the model to vast text so it learns language, facts and reasoning patterns by predicting the next token. Fine-tuning on curated instruction–response pairs teaches it to be helpful and follow tasks. Finally, RLHF (reinforcement learning from human feedback) aligns its tone and safety with human preferences.

Figure 4 — The training pipeline
Pre-trainingtrillions of tokens Fine-tuninginstruction pairs RLHFhuman preferences Raw capability → task-following → aligned, safe assistant

You interact with the model after all three stages — which is why it behaves like a helpful assistant, not an autocomplete box.

05Prompt engineering, demystified

Because output quality depends on the context the model attends to, how you ask is half the work. A strong prompt usually has four parts: a role, a clear task, the context or data, and the output format you want.

Anatomy of a strong prompt

  • Role — "You are an experienced EU project evaluator."
  • Task — "Review this project summary for clarity and impact."
  • Context — paste the summary, audience, and constraints.
  • Format — "Reply as 3 bullet strengths and 3 fixes."

06Patterns that reliably work

A handful of techniques consistently lift results. Few-shot prompting gives the model two or three worked examples before the real task. Chain-of-thought ("think step by step") improves reasoning on multi-step problems. Explicit constraints ("under 100 words, no jargon") prevent rambling. And iteration — refining the prompt after seeing the first answer — beats trying to write the perfect prompt in one shot.

2–3xFew-shot examples sharply improve format adherence
StepChain-of-thought boosts multi-step reasoning
100wExplicit limits keep answers focused and usable

Key takeaways

What to remember

  • An LLM predicts the next token — generation is that loop, repeated.
  • Models read sub-word tokens, not words or letters.
  • Attention lets each token weigh the relevant context.
  • Assistants are pre-trained, fine-tuned, then aligned with RLHF.
  • Good prompts state role, task, context and output format.
  • Few-shot, chain-of-thought, constraints and iteration reliably help.
RRINOVA
RRINOVA Research Team

We translate advanced technology and EU policy into practical training. This explainer is part of our open Insights series for educators, youth workers and SMEs.

Train your team on real AI skills

RRINOVA runs practical, hands-on AI and prompt-engineering workshops for educators, youth organisations and SMEs across Europe.

Talk to us