Shrijith Venkatramana

Posted on Feb 17

Fully Automated Gradient Calculation on Expression Graph (As Explained By Karpathy)

#programming #ai #python #machinelearning

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.

In the previous posts, we manually calculated gradient values for each node via backpropagation.

In this post - we will see how these manual calculations can be automated step by step to perform automatic back-propagation.

Implementing Automatic Back-Propagation for Each Node

We implement node-level grad when the Value is created.

First we do the "forward pass", that is, transform input to output using the operation.

And then immediately, we differentiate the operation, and calculate the grad for the input(s).

You can see in the code below that for each operation, we have added a custom _backward property which can trigger the backpropagation calculation.

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0
    self._backward = lambda: None # by default doesn't do anything (for a leaf
                                  # node for ex)

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    out = Value(self.data + other.data, (self, other), '+')

    # derivative of '+' is just distributing the grad of the output to inputs
    def backward():
      self.grad = 1.0 * out.grad
      other.grad = 1.0 * out.grad

    out._backward = backward

    return out

  def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    # derivative of `mul` is gradient of result multiplied by sibling's data
    def backward():
      self.grad = other.data * out.grad
      other.grad = self.data * out.grad

    out._backward = backward

    return out

  def tanh(self):
      x = self.data
      t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
      out = Value(t, (self, ), 'tanh')

      # derivative of tanh = 1 - (tanh)^2
      def backward():
        self.grad = (1 - t**2) * out.grad

      out._backward = backward
      return out

To start with, we initialize the expression graph - without doing any backprop activities. So grad in each node is 0:

# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')

# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')

# bias of the neuron
b = Value(6.8813735870195432, label='b')

x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'

n = x1w1x2w2 + b; n.label = 'n'

o = n.tanh(); o.label = 'o'

draw_dot(o)

Now we are ready to trigger backprop on each operation:

o.grad = 1.0 # base case - needs to be set explicitly
o._backward()
n._backward()
b._backward() # does nothing - leaf node
x1w1x2w2._backward()
x1w1._backward()
x2w2._backward()
draw_dot(o)

The result matches our manual calculations from the previous post:

Fully Automated BackPropagation

In the previous section - we got a sort of semi automated backprop.

We still had to manually pick nodes from "last" to "first", and call backprop in that order.

So the first task is to get a list of nodes in the "last" to "first" order.

For this, we are going to implement the topological sort idea into our program - so we always process nodes in the correct order

Topological Sort

Code:

topo = []
visited = set()
def build_topo(v):
    if v not in visited:
        visited.add(v)
        for child in v._prev:
            build_topo(child)
        topo.append(v)
build_topo(o)
topo

The above code gives us the nodes in the "first" to "last" order:

Using Topological Sort Results to Automate Backprop For the Entire Graph

Code:

o.grad = 1.0

topo = []
visited = set()
def build_topo(v):
    if v not in visited:
        visited.add(v)
        for child in v._prev:
            build_topo(child)
        topo.append(v)
build_topo(o)

for node in reversed(topo):
    node._backward()

draw_dot(o)

And voila - we get the gradient value for each node - in a fully automated manner in autograd:

Integrating the automated backprop logic into `Value` class

We add a new method backward to the Value class to perform backprop automatically from that node to all the previous nodes:

class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
    self.grad = 0.0
    self._backward = lambda: None # by default doesn't do anything (for a leaf
                                  # node for ex)

  def __repr__(self):
    return f"Value(data={self.data})"

  def __add__(self, other):
    out = Value(self.data + other.data, (self, other), '+')

    # derivative of '+' is just distributing the grad of the output to inputs
    def backward():
      self.grad = 1.0 * out.grad
      other.grad = 1.0 * out.grad

    out._backward = backward

    return out

  def __mul__(self, other):
    out = Value(self.data * other.data, (self, other), '*')

    # derivative of `mul` is gradient of result multiplied by sibling's data
    def backward():
      self.grad = other.data * out.grad
      other.grad = self.data * out.grad

    out._backward = backward

    return out

  def tanh(self):
      x = self.data
      t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
      out = Value(t, (self, ), 'tanh')

      # derivative of tanh = 1 - (tanh)^2
      def backward():
        self.grad = (1 - t**2) * out.grad

      out._backward = backward
      return out

  def backward(self):
    topo = []
    visited = set()
    def build_topo(v):
        if v not in visited:
            visited.add(v)
            for child in v._prev:
                build_topo(child)
            topo.append(v)
    build_topo(self)

    self.grad = 1.0
    for node in reversed(topo):
        node._backward()

Now I can just call the following to get calculate gradients for all the nodes automatically:

o.backward()
draw_dot(o)

Reference

The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

Forem

Fully Automated Gradient Calculation on Expression Graph (As Explained By Karpathy)

Implementing Automatic Back-Propagation for Each Node

Fully Automated BackPropagation

Topological Sort

Using Topological Sort Results to Automate Backprop For the Entire Graph

Integrating the automated backprop logic into `Value` class

Reference

Top comments (0)

Read next

Leveraging Rails Enums for Cleaner and More Efficient Code

🚀The Ultimate React 19 + Vite + Tailwind CSS v4 + ShadCN UI + React router v7 Starter Template！

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Chatbot with Semantic Kernel - Part 6: AI Connectors 🔌

Implementing Automatic Back-Propagation for Each Node

Fully Automated BackPropagation

Topological Sort

Using Topological Sort Results to Automate Backprop For the Entire Graph

Integrating the automated backprop logic into Value class

Reference

Read next

Leveraging Rails Enums for Cleaner and More Efficient Code

🚀The Ultimate React 19 + Vite + Tailwind CSS v4 + ShadCN UI + React router v7 Starter Template！

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Chatbot with Semantic Kernel - Part 6: AI Connectors 🔌

Integrating the automated backprop logic into `Value` class