DEV Community

Nibodh Daware
Nibodh Daware

Posted on • Originally published at nibodhdaware.hashnode.dev

How Deep Learning Works

Deep Learning is the core of a Machine Learning system, it is how a machine actually learns from data without much human intervention. In this post I am going to discuss how Deep Learning actually works with the data you give.

The basis of a Deep Learning system are Neural Networks, they are the fundamental part of how a machine learns by itself.

To understand how a Neural Network learns you need to understand how a Neural Network is structured.

There are mainly 3 (or more) layers in the neural network

1. Input Layer: Where the data to the network is provided.

2. Hidden Layer(s): Where the network learns from the data.

3. Output Layer: Where the network outputs for the particular data.

There can be more hidden layers depending on how complex you want the network to be.

Image description

Learning in Deep Learning

Each node in the neural network is assigned some value known as the biases and each edge is assigned a weight known as weights.

Weights: are the value of how important that neuron is

Bias: allows for shifting of the activation function (learn more) left or right.

So we do need to know how the network is mathematically represented but the intuition or meaning of the representation is much more important than the equation itself.

$$y = \sigma(\sum Xw + b)$$

Where,

y is the output of the network for the particular data

σ is the activation function

X is the value of the current node

w is the weight

b is the bias

So in short, we need the sum of all multiplications of X and w add some bias to them and pass it to the activation function to generate the value of y.

Forward Propagation

This algorithm enables to go through all the nodes in the network starting from the input layer.

The main goal of this algorithm is to calculate a estimated answer by the network which is wrong so for correction we have something called back propagation.

Back Propagation

This algorithm works completely opposite from forward propagation, where in it goes through all the nodes starting from the output layer. The difference between the estimated output and actual output is calculated with the help of a loss function and while going back the weights and biases are updated accordingly.

The back propagation and forward propagation takes place iteratively until the loss is minumum. This what make the neural network to learn.

Loss Functions

These functions calculate the loss or the difference between the actual output from the network and expected output from the data.

Optimizers

As the main goal of a neural network is to get the loss as minimum as possible Optimizers help in that, optimizer is a algorithm that will try to minimize the loss as low as possible, the loss function will help the optimizer to reduce the loss and properly allocate values to weights and biases of the network during backpropagation so as to reduce the loss.

Learning Rate

Learning Rate is a variable or value very small like 0.01 that when combined with the optimizer function helps to make sure that we do not skip over the value of the optimal data.

If the Learning Rate is too large, we might skip over the optimum data point.

If the Learning Rate is too small, it might take us long time to find the optimum data.

There are mainly 2 types of optimizers

  1. Gradient Descent - It is also known as the granddaddy of optimizers. - In gradient descent the weight is plotted on the x axis and the loss on the y axis and the weight on the gradient (the U curve) is chosen in such a way so as the loss is as minimum as possible so the data point should be as close to the x axis or the weight as possible on the gradient. - Backpropagation is gradient descent implemented on a network. - If you want to learn more about Gradient Descent read this.

Image description

  1. Stochastic Gradient Descent - It is very similar to gradient descent except we use a subset or batches of the entire dataset to calculate the gradient. - As we use smaller data compared to gradient descent, it is less computationally expensive.

Yes there are more optimizers but for understanding purposes the above are the most commonly used.

Conclusion

There are indeed more concepts to cover just to understand how deep learning works but this should be a good enough starting point to start to understand how the machine learns.

Top comments (0)