DEV Community

Ezinne Janet
Ezinne Janet

Posted on

Gradient Descent Explained, (For absolute beginners)

Dear Reader,
I hate long articles, and I know you do too.
So, this article has been broken into three parts for better understanding.
This is part one that introduces us to the theory aspect of our topic, part two would explain the mathematical aspect, while part three would teach us the code, enjoy.

Introduction To Gradient Descent

When training a model, our priority is to minimize the difference between the actual value and the predicted value. This process of doing this is called minimizing a function.
Gradient descent is an algorithm used in machine learning models to minimize a function; it allows us to find the best values for the model’s parameters.

How Does it Work

First you initialize the parameters, you start with random values that the model uses to discover how right or how wrong it is.
Like when adding salt to a pot of soup, you have to first add a certain amount of salt to the pot of soup before you know if that is the right amount and keep adding (or reducing till we get the perfect amount).
Next you calculate the error (loss function), this is a measurement of how wrong the model’s predicted value is from the actual value. For example, if the model predicted $4000 for a $5000 house, the error is $1000. An error tells us how wrong or how right our model is.
For the next step we determine the direction and rate of change of the error with respect to each parameter. In case you’re thinking what jargon is this now? I’ll explain
While driving, more speed equals to more distance covered, less speed equals to less distance covered. Here you can see how the change in speed affects the distance covered. To achieve a particular distance within a time frame you tweak (either increase or decrease) the speed.
In the same vein, parameters (weight, bias) affect a model’s predicted values, to change a model’s predicted value, you have to change it’s parameters.
Remember our aim is to reduce the error/loss function, to do this we have to change the predicted value, which is affected by the parameters, in this step we find out how the error changes whenever the parameters change, the rate of change of the loss function with respect to the parameters is called the gradient.

After we find out how the loss function is affected by a change in parameter (the gradient), we now adjust the parameters to reduce the errors.
If the gradient is positive, we decrease the parameters
If the gradient is negative, we increase the parameters
If the gradient is zero, we have the perfect parameters to get the lowest error.

We now repeat the process until the loss function is at its minimum.

I hope this has given you the introduction to gradient descent you need to understand the mathematical concept of it, see you in the next article, Ciao.

Top comments (0)