The foundation under every model

The maths behind machine learning.

The linear algebra, calculus and probability that make AI work, taught deeply and tied to code - the part most courses skip.

Book a free demo See what we cover

10,000+students taught

15+countries

70+live courses

4.9/5from 247+ reviews

Gradient descent over a loss surface

Each arrow is a gradient: the direction of steepest increase. Training takes small steps the opposite way - w := w − η∇L - until the loss reaches its minimum.

What you are looking at

One picture that holds the whole idea.

A model that learns is a model that walks downhill on a surface of error. The height of that surface is the loss - how wrong the model is right now. The arrows are gradients, the slope of the loss with respect to every weight, and they always point uphill. So learning is simple to state: take a small step in the opposite direction, again and again, until you reach the bottom.

That single move - w := w − η∇L - is where linear algebra, calculus, probability and optimisation all meet. The weights are a vector. The gradient comes from calculus. The loss is built from probability. The step size η is the optimisation knob you tune. Understand this picture and the rest of machine learning stops being magic.

The four pillars

Four kinds of maths, one working model.

Every machine learning system stands on the same four pillars. Most tutorials let you skip them by importing a library. We teach each one deeply and tie it straight to code, so you know not just what to type but why it works.

PILLAR 01

Linear algebra

Data is not a spreadsheet to a model - it is a stack of vectors. A row of features is a vector, a batch of rows is a matrix, and a layer of a network is a matrix of weights. The dot product between an input vector and a weight vector is, literally, how a neuron scores its input. Matrix multiplication is how an entire layer transforms a whole batch at once.

y = X @ W + b # one matrix multiply = a full layer

Tied to: every dense layer, every embedding, every attention head. Learn this and tensor shapes stop being a source of bugs.

PILLAR 02

Calculus

A derivative measures how a small change in one number changes another. Training a model means asking, for every weight, "if I nudge you, does the loss go up or down, and by how much?" The chain rule lets you answer that through many layers at once - and that is all backpropagation is: the chain rule applied carefully, from the output back to the first weight.

dL/dw = chain rule across every layer # = backprop

Tied to: gradient descent and backpropagation. Gradient descent is simply the derivative made practical.

PILLAR 03

Probability & statistics

Classification is a question about belief: how likely is this image a cat? Models output probabilities, and we judge them with likelihood. Cross-entropy - the standard loss for classification - comes directly from asking how surprised the true label is by the model's predicted distribution. Distributions, expectation and variance are how you reason about uncertainty instead of pretending it away.

loss = -sum(y * log(p)) # cross-entropy from likelihood

Tied to: every classifier, softmax output, and the loss function that trains it.

PILLAR 04

Optimisation

Once you can measure error and compute gradients, you need a strategy to reduce that error. That is optimisation: the loss surface, the descent rule, and the variants that make it work in practice - stochastic and mini-batch gradient descent, momentum, and adaptive methods like Adam. The learning rate decides how big each step is; too large and you overshoot, too small and you crawl.

w -= lr * grad # the heart of every optimiser

Tied to: the training loop itself. This is where a model either converges or quietly fails.

The depth argument

Why most courses skip it - and why that costs you later.

It is easy to see why the maths gets skipped. It is harder to teach, harder to sell, and a polished tutorial that imports a model and gets 95% accuracy in ten minutes feels like progress. So most courses hand you the library calls and move on. You can follow along, you can finish, and you can build something that runs. The problem only shows up later - when something breaks, and nothing in the tutorial prepared you to fix it.

That moment always comes. A loss curve flattens and will not move. A model scores beautifully in training and falls apart on real data. A class imbalance quietly wrecks your accuracy. An AI assistant hands you confident code that trains the wrong objective. The people who understand the maths know where to look: the gradient is vanishing, the learning rate is wrong, the loss does not match the goal, the data is leaking. The people who only know the library calls stall, guess, and copy answers that may or may not apply.

This is the difference between someone who can run a model and someone who can be trusted with one. Depth is what lets you debug a training run, tune it deliberately instead of by superstition, read a research paper and use its idea, and judge whether an output should be trusted at all. That is also why we built a wider case for going deep at real coding classes - because shortcuts feel faster right up to the point where they cost you everything you were trying to save.

How we teach it

How we teach maths so it actually clicks.

We have one rule: derive, never memorise. Maths sticks when you can rebuild it, and it rebuilds best when intuition, notation and code arrive in that order.

STEP 01

Intuition first

Before a single symbol, you see the idea. A gradient is which way is downhill. A dot product is how much two vectors agree. A probability is a degree of belief. When the picture is clear, the notation has something to attach to.

STEP 02

Then the notation

Now we write it properly, because notation is how you read papers and talk to other engineers. We unpack the symbols slowly - what ∇L means, why subscripts index a vector, how a sum becomes a matrix product - so the maths reads like language, not code you cannot run.

STEP 03

Then the code

Every idea ends in code you run yourself. You write the update step, watch the loss fall, break it with a bad learning rate, and fix it. Each piece of maths is tied to a small model you build, so theory and implementation prove each other.

Because classes are live, online and in small batches, none of this happens at you. You derive the chain rule with a mentor watching your working, you write the backward pass while they catch the off-by-one in your indexing, and you get your reasoning reviewed in the same session you learned it. When you finish a topic, you have not just seen it - you have rebuilt it, in maths and in code, with someone making sure both are right.

Where you learn it

Courses that build this maths into real models.

These live, small-batch courses carry the maths on this page into working systems. Each one ties linear algebra, calculus, probability and optimisation to code you write and understand.

AI & ML Complete

The full path through machine learning - you implement gradient descent, loss functions and core models, then scale them with the standard libraries.

View course →

Artificial Intelligence Complete

A broad, deep AI masterclass where probability, search and optimisation become the reasoning behind real intelligent systems, not just buzzwords.

View course →

Data Science Masterclass

Statistics, distributions and linear algebra applied to real datasets, so your conclusions rest on maths you can defend rather than charts you hope are right.

View course →

Python Masterclass from zero to advanced course thumbnail

Python Masterclass - Zero to Advanced

The language every model is written in, taken to the level where NumPy arrays and vector thinking feel natural - the groundwork the maths runs on.

View course →

Data Structures and Algorithms course thumbnail

Data Structures & Algorithms

The complexity and discrete-maths thinking that keeps your models efficient - how cost grows, why an algorithm scales, and where it does not.

View course →

Browse all courses

Go deeper

Related deep-dives.

Once the maths makes sense, these companion pages show where it leads - the models, the fields, and the full path through them.

How large language models work

From tokens to attention to the maths of prediction - what is really happening inside a language model.

Read the explainer →

How computer vision works

How convolutions, filters and matrices let a machine turn pixels into meaning.

Read the explainer →

Machine learning from scratch

Implement gradient descent and core models by hand before you ever import a library.

See the course path →

Master AI, ML, Python & Java

The wider, deeper tracks that take you from language fundamentals to production AI.

Explore the tracks →

Questions

Frequently asked.

How much maths do I really need for machine learning?

Less than people fear, and more than most courses teach. You need four areas: linear algebra to handle data and weights as vectors and matrices, calculus to understand how a model learns through gradients, probability and statistics to reason about uncertainty and loss, and optimisation to see how training actually works. You do not need a graduate degree. You need school algebra, curiosity, and a teacher who derives each idea instead of asking you to memorise it. That is exactly the level we teach to.

I am weak at maths - can I still do this?

Yes. Most learners who think they are weak at maths were simply taught it as disconnected rules to memorise. We teach it the other way: intuition first, then the notation, then code you run yourself. When you see a derivative move a model toward a better answer, or a dot product score a prediction, the maths stops being abstract. We start from school-level algebra and build up in small live steps, with a mentor checking your understanding as you go.

Is this just theory or is it applied?

Every idea is tied to running code. You will not just read that gradient descent minimises a loss function - you will write the update step in Python, watch the loss fall, and break it on purpose to see what a bad learning rate does. We pair each piece of maths with a small model you build, so the theory and the implementation reinforce each other. You leave able to read a paper and able to debug a training run.

Who is this for and what ages?

This depth-first material suits serious teenagers, college students and working professionals who want to genuinely understand machine learning rather than only call libraries. Our wider school teaches ages 6 to 65, but the maths behind machine learning is best taken once you are comfortable with school algebra and some Python. If you are younger or newer, a free demo will help us place you on the right track first.

Are classes live?

Yes. Every class is live, online and instructor-led in a small batch. You derive the maths with the mentor, write the code during the session, and get your reasoning and your implementation reviewed in real time. It is not a recorded video you watch alone, and it is not a slide deck read out to you - it is worked maths and worked code, together.

Do I really need the maths if AI tools can build models?

Yes, more than ever. AI tools can generate model code in seconds, but they cannot tell you why a loss curve has stalled, why a model is overconfident and wrong, or whether the metric you optimised is the one you actually wanted. The maths is exactly what lets you understand, debug, tune and trust both your models and the output an AI assistant hands you. As tools change every few months, the maths is the durable skill that stays valuable - it is what separates someone who directs these tools from someone who is replaced by them.

See the maths come alive in a live class.

Sit in on a free demo, watch a real idea get derived and then coded, and decide for yourself whether depth is worth it.

Book a free demo Browse all courses