Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Adding Labels To Improve Graph Readability
Add label
parameter to Value
class:
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
return Value(self.data + other.data, (self, other), '+')
def __mul__(self, other):
return Value(self.data * other.data, (self, other), '-')
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
print(d._prev)
print(d._op)
print("---")
print(e._prev)
print(e._op)
Update draw_dot
to include the label in the graph
Originally we had the node expression as:
dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record')
Replace with:
dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record')
Now draw_dot(d)
returns:
Re-Render graph with Labels
Let's add a few nodes - f
and L
to the expression
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Generate graph:
draw_dot(L)
This graph we've built above is the forward-pass of laying out the nodes.
What We Want to Calculate
We want to know how the inputs (weights - a,b,c,d,e,f
) affect the output (the loss function L
). So - we want to find: dL/dL
, dL/df
, dL/de
, dL/dd
, dL/dc
, dL/db
, dL/da
.
Add the grad
parameter to accommodate backpropogation
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0 # 0 means no impact on output to start with
Update the node graphics information
dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
Manually Performing Back-Propagation for The Given Graph
Node L
What is dL/dL
- that is if we change L
by a tiny amount, how will it affect the output L
? The answer is obviously - 1
.
That is,
L.grad = 1
The Expression
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Node d
L = d * f
By known rules:
dL/dd = f
By derivation:
dL/dd =
(f(x+h) - f(x))/h =
(d*f + h*f - d*f)/h =
h*f/h =
f
That is, dL/dd = f = -2.0
So, we do
d.grad = -2.0
Node f
By symmetry, we get that dL/df = d = 4.0
That is,
f.grad = 4.0
The new updated graph is like this:
How to do Numerical Verification of the Derivatives
def verify_dL_by_df():
h = 0.001
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L1 = L.data
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0 + h, label='f') # bumb f a little bit
L = d * f; L.label = 'L'
L2 = L.data
print((L2 - L1)/h)
verify_dL_by_df() # prints out 3.9999 ~ 4
The Challenge - How do we calculate dL/dc
?
We know dL/dd = -2.0
- so we know how L
is affected by d
.
The question is how is c
going to impact L
through d
.
First, we can calculate the "local derivative", or figure out how c
impacts d
first.
That is,
dd/dc = ?
We know that:
d = c + e
So once we differentiate by c
, we get: dd/dc = 1
Similarly, dd/de = 1
.
Now the question is, how to put together dd/dc
and dL/dd
?
We need something called the Chain Rule:
So, applying chain rule, we get:
dL/dc = dL/dd * dd/dc
dL/dc = -2.0 * 1.0 = -2.0
Similarly, dL/de = -2.0
Let's set the values in python, and redraw the graph now:
c.grad = -2.0
e.grad = -2.0
Figuring out dL/da and dL/db
We know:
dL/de = -2.0
We want to know:
dL/da = dL/de * de/da
We know that:
e = a * b
de/da = b
de/da = b = -3.0
We can also find:
e = a * b
de/db = a
de/db = a = 2.0
So, now to get what we need:
dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0
dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0
We set the values in python, and redraw to get the full graph:
a.grad = 6.0
b.grad = -4.0
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube
Top comments (0)