The loss functions for Neural Network in PyTorch

#pytorch #lossfunction #neuralnetwork #function

A loss function is the function which can get the losses(differences) between a model's predictions and true values to evaluate how good a model is. *Loss function is also called Cost Function or Error Function.

There are popular loss function as shown below:

(1) L1 Loss:

can compute the average of the sum of the absolute losses(differences) between a model's predictions and true values.
's formula is as shown below:
's pros are as shown below:
- Less sensitive to outliers.
- We can easily complare the losses because they are just made absolute so the range of them is not big.
's cons are as shown below:
- The absolute loss of |0| cannot be differentiable according to this post and this post.
is used for a regression model.
is also called Mean Absolute Error(MAE).
is L1Loss() in PyTorch.

(2) L2 Loss:

can compute the average of the sum of the squared losses(differences) between a model's predictions and true values.
's formula is as shown below:
's pros are as shown below:
- All squared losses can be differentiable.
's cons are as shown below:
- Sensitive to outliers.
- We cannot easily complare the losses because they are squared so the range of them is big.
is used for a regression model.
is also called Mean Squared Error(MSE).
is MSELoss() in PyTorch

(3) Huber Loss:

can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between a model's predictions and true values compared with delta which you set. *Memos:
- delta is 1.0 basically.
- Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.
's formula is as shown below. *The 1st one is L2 Loss-like one and the 2nd one is L1 Loss-like one:
's pros are as shown below:
- Less sensitive to outliers.
- All losses can be differentiable.
- We can more easily complare the losses than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.
's cons are as shown below:
- The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.
is used for a regression model.
is HuberLoss() in PyTorch.
with delta of 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch.

(4) BCE(Binary Cross Entropy) Loss:

can compute the losses(differences) between a model's binary predictions and true binary values.
s' formula is as shown below:
is used for Binary Classification. *Binary Classification is the technology to classify data into two classes.
is also called Binary Cross Entropy or Log(Logarithmic) Loss.
is BCELoss() in PyTorch. *Memos:
- There is also BCEWithLogitsLoss() which is the combination of BCE Loss and Sigmoid Activation Function in PyTorch.
- Sigmoid Activation Function suits BCE Loss to be more stable.

(5) Cross Entropy Loss:

can compute the losses(differences) between a model's predictions and true values. *A loss is between 0 and 1.
s' formula is as shown below:
is used for Multiclass Classification and Computer Vision. *Memos:
- Multiclass Classification is the technology to classify data into multiple classes.
- Computer vision is the technology which enables a computer to understand objects.
is CrossEntropyLoss() in PyTorch.

DEV Community

The loss functions for Neural Network in PyTorch

Top comments (0)

Read next

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers

garak: A Framework for Security Probing Large Language Models

Streamlining Development Workflow: Automating Tasks with GitHub Actions