Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Training: Changing weight.data
based on weight.gradient
slightly (accorindg to Learning Rate)
From the previous post - we created n.parameters()
a list of all the nodes in the neural network.
In total we have 41
neurons:
One of the neuron's data value is shown below:
Now the goal is to change the data value of this neuron, in accordance with the gradient feedback.
for p in n.parameters():
p.data += 0.01 * p.grad # something like that (wip)
We also note that the gradient is negative for that neuron:
Determining the sign of the step factor 0.01
So there's a bit of reasoning to do here, to determine the sign of step factor.
The goal is to minimize the loss (bring loss = 0).
p.data
is positive - 0.85
.
p.grad
is negative - -0.27
.
In p.data += 0.01 * p.grad
, the result is p.data
is decreased a bit, increasing the loss.
But in p.data += -0.01 * p.grad
, the result is p.data
is increased a bit, reducing the loss.
So therefore, the correct option is to have a negative step factor.
Corrected code:
for p in n.parameters():
p.data += -0.01 * p.grad
Now we can see that the loss before/after weight adjustment and conclude that - through the backward pass plus gradient descent, we got a more accurate result:
Automating Gradient Descent To Get a Highly Accurate Network
Setting the right learning rate value is a subtle art. If it is too low, it takes too long to converge. If it is too large a step size, the process gets unstable and may explode the loss. So finding the perfect rate is a subtle art.
Implementing a Training Loop
We put a loop repeating the forward pass, backward pass and weight updates process:
for k in range(20):
# forward pass
ypred = [n(x) for x in xs]
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))
# backward pass
loss.backward()
# update - gradient descent
for p in n.parameters():
p.data += -0.05 * p.grad
print(k, loss.data)
The training gives an output like this:
0 4.149044341397712
1 2.8224176124705482
2 1.0767374634555338
3 0.4436221441110331
4 0.048639680823661345
5 0.0007984305003777319
6 5.758159329954795e-06
7 1.1072290005342024e-07
8 1.1331571852917713e-08
9 1.8004031247688252e-09
10 3.886667439780539e-10
11 1.190170455797565e-10
12 5.491701244447392e-11
13 4.086071696354591e-11
14 5.2487460541263784e-11
15 1.235857710202349e-10
16 5.557297068527374e-10
17 4.829530833029305e-09
18 7.912558681799505e-08
19 2.2910484425631455e-06
You can see that the loss is getting to really small numbers near the final passes.
Now we compare actual y to predicted y:
print("actual", ys)
print("predicted", ypred)
And we get perfect results:
actual [1.0, -1.0, -1.0, 1.0]
predicted [
Value(data=1.0, grad=0.0, label=''),
Value(data=-0.9986538494836703, grad=0.0026923010326593833, label=''),
Value(data=-0.9993079543151291, grad=0.0013840913697418245, label=''),
Value(data=1.0, grad=0.0, label='')
]
Fixing a subtle bug in the training loop
Each of the neurons in the net has weight and grad attributes.
In our training loop, the first iteration is fine - when we do loss.backward()
we fill in the grad values for each neuron.
But on the second iteration and next, we keep accumulating the grad values (and are never reset to 0).
So the feedback given to each neuron could be slightly wrong. We have to reset grad to 0.
Corrected training loop:
for k in range(20):
# forward pass
ypred = [n(x) for x in xs]
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))
# > reset grad to zero before backward pass
for p in n.parameters():
p.grad = 0.0
# backward pass
loss.backward()
# update - gradient descent
for p in n.parameters():
p.data += -0.05 * p.grad
print(k, loss.data)
We get a similar result as above in this case, since the problem was quite a simple one. It so happens in neural network that sometimes, we seem to get a successful result even when the logic is a bit buggy. For complex problems, these sorts of issues/bugs can derail the solution process - and one has to watch out for common mistakes.
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube
Top comments (0)