HomeInsightsArtificial Intelligence
AI & Machine Learning

How Neural Networks Actually Work

Strip away the hype and a neural network is a surprisingly simple idea repeated at scale: weighted sums, a pinch of non-linearity, and a feedback loop that nudges millions of numbers until the network stops being wrong. Here is what is really happening inside.

12 min read 12 May 2026 RRINOVA Research Team
Abstract visualisation of an artificial neural network
An artificial neural network mirrors the brain only loosely — its real engine is linear algebra and calculus. Photo: CC0.

Ask ten people what a neural network is and you will get ten metaphors about brains and neurons firing. The metaphor is charming, but it hides the machinery. A neural network is a function — a very large, adjustable function — that maps inputs to outputs, and "learning" is just the disciplined process of adjusting it until its outputs are useful.

In this explainer we build the idea from the bottom up: one neuron, then a layer, then a full network, and finally the learning loop that powers everything from spam filters to large language models. No prior maths beyond multiplication and a little intuition about slopes is assumed.

01Inside a single neuron

The atom of every network is the artificial neuron. It does three things: it multiplies each input by a weight, adds everything up together with a bias, and passes the result through an activation function. That is the whole story of a neuron.

Formally, for inputs x₁…xₙ with weights w₁…wₙ and bias b, the neuron computes z = w₁x₁ + w₂x₂ + … + wₙxₙ + b, then outputs a = f(z). The weights decide how much each input matters; the bias shifts the threshold at which the neuron "fires".

Figure 1 — Anatomy of a neuron
x₁ x₂ x₃ w₁w₂w₃ Σ + b f(z) activation a

Weighted sum, then squash. Each input is scaled by its weight, summed with a bias, and pushed through an activation that decides the neuron's output.

02Why activation functions matter

If a neuron only ever computed a weighted sum, stacking thousands of them would still produce nothing more than a single straight-line relationship — a thousand linear steps collapse into one. The activation function introduces a deliberate kink, a non-linearity, and that kink is what lets networks model curves, edges, language and everything else that is not a straight line.

Three functions dominate practice. ReLU (max(0, z)) is the modern default: cheap, and it keeps gradients alive for positive inputs. Sigmoid squashes any number into the range 0–1, handy for probabilities. Tanh does the same but centred on zero.

Figure 2 — Three common activations
ReLU = max(0, z) Sigmoid → (0,1) Tanh → (−1,1)

Each curve bends the straight weighted sum into something expressive. ReLU dominates deep networks because it is fast and resists vanishing gradients.

"Without non-linearity, a deep network is just an expensive way to draw a straight line."

03The forward pass

Neurons are organised into layers. Data enters at the input layer, flows through one or more hidden layers, and leaves at the output layer. Each neuron in a layer is connected to every neuron in the next, and each connection carries its own weight. Pushing an input all the way through to a prediction is called the forward pass.

Concretely, the whole layer's computation is one matrix multiplication: take the vector of inputs, multiply by the weight matrix, add the bias vector, apply the activation. Repeat for each layer. This is why GPUs — which are built for fast matrix maths — turned out to be the perfect hardware for deep learning.

Figure 3 — Signal flowing through layers
FORWARD PASS → × W₁× W₂× W₃ ŷ InputHidden 1Hidden 2Output

Information moves left to right. Every layer is one matrix multiply plus an activation — small operations, repeated billions of times.

04Measuring how wrong we are

A fresh network is initialised with random weights, so its first predictions are nonsense. To improve, it needs a number that says how nonsense. That number is the loss (or cost). For regression we often use mean squared error; for classification, cross-entropy. The loss compares the prediction ŷ against the true answer y and returns a single value — high when wrong, near zero when right.

Learning, then, has a precise goal: find the weights that make the loss as small as possible. Everything else is bookkeeping.

05Backpropagation & gradient descent

Here is the genuinely clever part. The loss is a function of every weight in the network. Calculus gives us the gradient — the direction in which each weight should move to reduce the loss fastest. Backpropagation is an efficient algorithm for computing that gradient for all weights at once, working backwards from the output using the chain rule.

Once we know the gradient, gradient descent takes a small step downhill: w ← w − η · ∂L/∂w, where η is the learning rate. Repeat this loop — forward pass, compute loss, backpropagate, update — millions of times, and the random network gradually becomes a competent one.

Figure 4 — Gradient descent down the loss surface
minimum Loss (how wrong) Weight value →

Each update is a step downhill. The learning rate sets the step size — too big and you overshoot, too small and training crawls.

The training loop in four steps

  • Forward pass — run an input through the network to get a prediction.
  • Compute loss — measure the gap between prediction and truth.
  • Backpropagate — use the chain rule to find each weight's gradient.
  • Update — nudge every weight a little against its gradient. Repeat.

06Why "deep" works

Stacking many layers — going deep — lets a network build a hierarchy of features. In an image model, early layers learn edges and colours, middle layers assemble those into textures and shapes, and later layers recognise whole objects. Nobody programs this hierarchy; it emerges because each layer learns to transform the representation handed to it by the layer below.

The same principle scales to language. A large language model is, at heart, the same neuron stacked into very deep, very wide layers (with an attention mechanism layered on top), trained with the same forward-pass / backprop loop on enormous text corpora. The arithmetic is humble; the scale is what feels like magic.

1957The perceptron, the first trainable neuron, is introduced
1986Backpropagation popularised for multi-layer networks
2012Deep nets on GPUs win ImageNet, igniting the modern era

Key takeaways

What to remember

  • A neuron is just a weighted sum plus a non-linear activation — nothing more.
  • Activation functions are what give networks the power to model non-linear reality.
  • A forward pass is a chain of matrix multiplications, which is why GPUs excel.
  • The loss turns "is it good?" into a single number to minimise.
  • Backpropagation + gradient descent is the learning engine behind all of it.
  • Depth lets simple features compose into complex understanding.
RRINOVA
RRINOVA Research Team

We translate advanced technology and EU policy into practical training. This explainer is part of our open Insights series for educators, youth workers and SMEs.

Turn this into a workshop

RRINOVA designs hands-on AI literacy training for schools, youth organisations and SMEs across Europe. Bring this material to your team.

Talk to us