DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on • Edited on

Back-Propagation Spelled Out - As Explained by Karpathy

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.

Adding Labels To Improve Graph Readability

Add label parameter to Value class:

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    return Value(self.data + other.data, (self, other), '+')

  def __mul__(self, other):
    return Value(self.data * other.data, (self, other), '-')

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
print(d._prev)
print(d._op)
print("---")
print(e._prev)
print(e._op)
Enter fullscreen mode Exit fullscreen mode

Update draw_dot to include the label in the graph

Originally we had the node expression as:

dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record')
Enter fullscreen mode Exit fullscreen mode

Replace with:

dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record')
Enter fullscreen mode Exit fullscreen mode

Now draw_dot(d) returns:

Re-Render graph with Labels

Graph with Labels

Let's add a few nodes - f and L to the expression

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Enter fullscreen mode Exit fullscreen mode

Generate graph:

draw_dot(L)
Enter fullscreen mode Exit fullscreen mode

More Complex Expression

This graph we've built above is the forward-pass of laying out the nodes.

What We Want to Calculate

We want to know how the inputs (weights - a,b,c,d,e,f) affect the output (the loss function L). So - we want to find: dL/dL, dL/df, dL/de, dL/dd, dL/dc, dL/db, dL/da.

Add the grad parameter to accommodate backpropogation

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0 # 0 means no impact on output to start with
Enter fullscreen mode Exit fullscreen mode

Update the node graphics information

dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
Enter fullscreen mode Exit fullscreen mode

Graph with grad property

Manually Performing Back-Propagation for The Given Graph

Node L

What is dL/dL - that is if we change L by a tiny amount, how will it affect the output L? The answer is obviously - 1.

That is,

L.grad = 1
Enter fullscreen mode Exit fullscreen mode

The Expression

a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Enter fullscreen mode Exit fullscreen mode

Node d

L = d * f

By known rules:

dL/dd = f

By derivation:

dL/dd = 

(f(x+h) - f(x))/h = 

(d*f + h*f - d*f)/h = 

h*f/h =

f

That is, dL/dd = f = -2.0
Enter fullscreen mode Exit fullscreen mode

So, we do

d.grad = -2.0
Enter fullscreen mode Exit fullscreen mode

Node f

By symmetry, we get that dL/df = d = 4.0

That is,

f.grad = 4.0
Enter fullscreen mode Exit fullscreen mode

The new updated graph is like this:

Updated Graph

How to do Numerical Verification of the Derivatives

def verify_dL_by_df():
  h = 0.001

  a = Value(2.0, label='a')
  b = Value(-3.0, label='b')
  c = Value(10, label='c')
  e = a * b; e.label = 'e'
  d = e + c; d.label = 'd'
  f = Value(-2.0, label='f')
  L = d * f; L.label = 'L'
  L1 = L.data

  a = Value(2.0, label='a')
  b = Value(-3.0, label='b')
  c = Value(10, label='c')
  e = a * b; e.label = 'e'
  d = e + c; d.label = 'd'
  f = Value(-2.0 + h, label='f') # bumb f a little bit
  L = d * f; L.label = 'L'
  L2 = L.data

  print((L2 - L1)/h)

verify_dL_by_df() # prints out 3.9999 ~ 4
Enter fullscreen mode Exit fullscreen mode

The Challenge - How do we calculate dL/dc?

We know dL/dd = -2.0 - so we know how L is affected by d.

The question is how is c going to impact L through d.

First, we can calculate the "local derivative", or figure out how c impacts d first.

That is,

dd/dc = ?

We know that:

d = c + e

So once we differentiate by c, we get: dd/dc = 1

Similarly, dd/de = 1.

Now the question is, how to put together dd/dc and dL/dd?

We need something called the Chain Rule:

Chain Rule

So, applying chain rule, we get:

dL/dc = dL/dd * dd/dc
dL/dc = -2.0 * 1.0 = -2.0
Enter fullscreen mode Exit fullscreen mode

Similarly, dL/de = -2.0

Let's set the values in python, and redraw the graph now:

c.grad = -2.0
e.grad = -2.0
Enter fullscreen mode Exit fullscreen mode

Graph with grads for c & e

Figuring out dL/da and dL/db

We know:

dL/de = -2.0

We want to know:

dL/da = dL/de * de/da

We know that:

e = a * b
de/da = b
de/da = b = -3.0
Enter fullscreen mode Exit fullscreen mode

We can also find:

e = a * b
de/db = a
de/db = a = 2.0
Enter fullscreen mode Exit fullscreen mode

So, now to get what we need:

dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0
dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0
Enter fullscreen mode Exit fullscreen mode

We set the values in python, and redraw to get the full graph:

a.grad = 6.0
b.grad = -4.0
Enter fullscreen mode Exit fullscreen mode

Final graph

Reference

The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

Top comments (0)