Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Breaking down tanh
into its constituent operations
We have the definition of tanh
as follows:
We can see that the above formula has:
- exponentiation
- subtraction
- addition
- division
What the Value
class cannot do now
a = Value(2.0)
a + 1
The above doesn't work, because a
is of Value type whereas 1
is of int
type.
We can fix this by automatically trying to convert 1
into a Value
type in the __add__
method:
class Value:
...
def __add__(self, other):
other = other if isinstance(other, Value) else Value(other) # convert non-Value type to Value type
out = Value(self.data + other.data, (self, other), '+')
Now, the following code works:
a = Value(3.0)
a + 1 # Gives out `Value(data=4.0)`
The same line is added to __mul__
as well to provide that automatic type conversion:
other = other if isinstance(other, Value) else Value(other)
Now, the following will work:
a = Value(3.0)
a * 2 # will give out `Value(data=6.0)`
A (Potentially) Surprising Bug
How about the following code - will this work?
a = Value(3.0)
2 * a
The answer is that - no, that doesn't work:
So, to solve the ordering problem, in python we must specify __rmul__
(right multiply):
class Value:
....
def __rmul__(self, other): # other * self
return self * other
Now, the following will work:
Implementing The Exponential Function
We add the following method for calculating exponents in the Value
class.
def exp(self):
x = self.data
out = Value(math.exp(x), (self,), 'exp')
def _backward():
self.grad += out.data * out.grad # d(e^x) = e^x. Then apply chain rule
out._backward = _backward
return out
Adding Support for a / b
We want to support division of Value
objects. And it happens that we can reformulate a / b
in a more convenient way:
a / b
= a * (1 / b)
= a * (b**-1)
To implement the above scheme we will require a pow
(power) method:
def __pow__(self, other):
assert isinstance(other, (int, float)), "only supporting int/float powers for now"
out = Value(self.data**other, (self,), f'**{other}')
def _backward():
self.grad += (other * self.data**(other-1)) * out.grad
out._backward = _backward
return out
The above method implements the power rule to calculate the derivative of a power expression:
We also need subtraction and we do it using addition and negation:
def __neg__(self): # -self
return self * -1
def __sub__(self, other): # self - other
return self + (-other)
The Test - Replace old tanh
with its constituent formula
The code:
# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.8813735870195432, label='b')
# x1*w1 + x2*w2 + b
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
#
e = (2*n).exp()
o = (e - 1) / (e + 1)
o.label = 'o'
o.backward()
draw_dot(o)
The result:
You can check via the lat post, that the output/result of the tanh operation was 0.7071
. Even after the change it is the same. So looks like we were able to break down tanh into more fundamental operations such as exp
, pow
subtract
, etc
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube
Top comments (0)