Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
In the previous posts, we manually calculated gradient values for each node via backpropagation.
In this post - we will see how these manual calculations can be automated step by step to perform automatic back-propagation.
Implementing Automatic Back-Propagation for Each Node
We implement node-level grad
when the Value
is created.
First we do the "forward pass", that is, transform input to output using the operation.
And then immediately, we differentiate the operation, and calculate the grad for the input(s).
You can see in the code below that for each operation, we have added a custom _backward
property which can trigger the backpropagation calculation.
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0
self._backward = lambda: None # by default doesn't do anything (for a leaf
# node for ex)
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
out = Value(self.data + other.data, (self, other), '+')
# derivative of '+' is just distributing the grad of the output to inputs
def backward():
self.grad = 1.0 * out.grad
other.grad = 1.0 * out.grad
out._backward = backward
return out
def __mul__(self, other):
out = Value(self.data * other.data, (self, other), '*')
# derivative of `mul` is gradient of result multiplied by sibling's data
def backward():
self.grad = other.data * out.grad
other.grad = self.data * out.grad
out._backward = backward
return out
def tanh(self):
x = self.data
t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
out = Value(t, (self, ), 'tanh')
# derivative of tanh = 1 - (tanh)^2
def backward():
self.grad = (1 - t**2) * out.grad
out._backward = backward
return out
To start with, we initialize the expression graph - without doing any backprop activities. So grad
in each node is 0:
# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.8813735870195432, label='b')
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
o = n.tanh(); o.label = 'o'
draw_dot(o)
Now we are ready to trigger backprop on each operation:
o.grad = 1.0 # base case - needs to be set explicitly
o._backward()
n._backward()
b._backward() # does nothing - leaf node
x1w1x2w2._backward()
x1w1._backward()
x2w2._backward()
draw_dot(o)
The result matches our manual calculations from the previous post:
Fully Automated BackPropagation
In the previous section - we got a sort of semi automated backprop.
We still had to manually pick nodes from "last" to "first", and call backprop in that order.
So the first task is to get a list of nodes in the "last" to "first" order.
For this, we are going to implement the topological sort idea into our program - so we always process nodes in the correct order
Topological Sort
Code:
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(o)
topo
The above code gives us the nodes in the "first" to "last" order:
Using Topological Sort Results to Automate Backprop For the Entire Graph
Code:
o.grad = 1.0
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(o)
for node in reversed(topo):
node._backward()
draw_dot(o)
And voila - we get the gradient value for each node - in a fully automated manner in autograd:
Integrating the automated backprop logic into Value
class
We add a new method backward
to the Value
class to perform backprop automatically from that node to all the previous nodes:
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0
self._backward = lambda: None # by default doesn't do anything (for a leaf
# node for ex)
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
out = Value(self.data + other.data, (self, other), '+')
# derivative of '+' is just distributing the grad of the output to inputs
def backward():
self.grad = 1.0 * out.grad
other.grad = 1.0 * out.grad
out._backward = backward
return out
def __mul__(self, other):
out = Value(self.data * other.data, (self, other), '*')
# derivative of `mul` is gradient of result multiplied by sibling's data
def backward():
self.grad = other.data * out.grad
other.grad = self.data * out.grad
out._backward = backward
return out
def tanh(self):
x = self.data
t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
out = Value(t, (self, ), 'tanh')
# derivative of tanh = 1 - (tanh)^2
def backward():
self.grad = (1 - t**2) * out.grad
out._backward = backward
return out
def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()
Now I can just call the following to get calculate gradients for all the nodes automatically:
o.backward()
draw_dot(o)
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube
Top comments (0)