Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Neural networks might seem complex, but at their core, they rely on a simple yet powerful concept: derivatives. Andrej Karpathy’s micrograd
proves this beautifully—it's just two Python files with less than 150 lines of code, yet it captures the fundamental ideas behind neural networks.
This blog breaks down micrograd
step by step, starting with the very foundation: what derivatives really mean and how we compute them. You’ll learn:
- How backpropagation works by understanding derivatives in the simplest way
- The difference between symbolic and computational differentiation
- How small input changes affect output (positive, negative, and zero slopes)
- Why neural networks don’t need explicit derivative formulas
With visual explanations, simple code snippets, and practical insights, by the end of this post, you’ll have a solid grasp of how gradients drive learning in neural networks—without drowning in unnecessary complexity. Let’s dive in.
Karpathy's micrograd
is just 2 files of Python (< 150 LOC)
micrograd
consists of just two small files:
-
engine.py
: Less than 100 lines of code, defines theValue
class, the code that powers the neural network -
nn.py
: DefinesNeuron
,Layer
andMLP
(Multi-Layer Perceptron). In total, around 60 lines of code.
Fundamentally - the core ideas behind neural networks can be captured in just under 150 lines of simple Python code. The rest of the code complexity in other libraries is about efficiency.
Groundwork for understanding the definition of derivatives
The first goal is to understand the concept of derivatives with some examples. So we do the following to prepare some groundwork:
- Define a function
f
- takes in scalar input, gives scalar output - Generate a range of values for
x
andy
(input/output) - Plot the values
Two Ways of Calculating Derivative
The task is to find the derivative of the function at particular points, such as where x=3
and so on.
In school, we are usually taught the symbolic method.
Say for the expression 3*x**2 - 4*x + 5
, we can find the derivative expression to be 6*x - 4
.
But since we're dealing with neural networks - the expression we are dealing with could be huge, and nobody writes those expressions down.
So instead of taking a symbolic approach - we take a computational approach.
However, it is useful to understand what derivatives mean at a conceptual level first - before we move onto the computations.
The Meaning of a Differentiable Function
The key formula is the following:
In the above image - we see that h
is a very small value, and it keeps getting smaller, vanishing towards a 0.
The question is - what is the trend of a function's output, when there's a small bump/increase in its input.
At a higher level, we are asking at point x
, if we increase it by a tiny amount h
to get x+h
, does the output increase or decrease? And the change in output - what is the magnitude of it?
The resultant value of the formula is a slope. And if a bump in input leads to a positive slope, it means the value of output increases.
If the input is increased, and we get a negative slope it means, the value of output decreases.
Also at a specific point of 2/3
in the above diagram you can also see that a slight increase in input will still keep the same output - that is we have a zero slope
Numerical Exploration
The above intuition can be validated with some numerical exploration with a valid x
value and a tiny h
value.
Positive Slope Example
Negative Slope Example
Zero Slope Example
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd)
Top comments (0)