DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

Loss functions in PyTorch

Buy Me a Coffee

*Memos:

  • My post explains layers in PyTorch.
  • My post explains activation functions in PyTorch.
  • My post explains optimizers in PyTorch.

A loss function is the function which can get the mean(average) of the sum of the losses(differences) between model's predictions and train or test data to optimize a model during training or to evaluate how good a model is during testing. *Loss function is also called Cost Function or Error Function.

There are popular loss functions as shown below:

(1) L1 Loss:

  • can compute the mean(average) of the sum of the absolute losses(differences) between model's predictions and train or test data.
  • 's formula: Image description
  • is used for a regression model.
  • is also called Mean Absolute Error(MAE).
  • is L1Loss() in PyTorch. *My post explains L1Loss().
  • 's pros:
    • It's less sensitive to outliers and anomalies.
    • The losses can be easily compared because they are just made absolute so the range of them is not big.
  • 's cons:

(2) L2 Loss:

  • can compute the mean(average) of the sum of the squared losses(differences) between model's predictions and train or test data.
  • 's formula: Image description
  • is used for a regression model.
  • is also called Mean Squared Error(MSE).
  • is MSELoss() in PyTorch. *My post explains MSELoss().
  • 's pros:
    • All squared losses can be differentiable.
  • 's cons:
    • It's sensitive to outliers and anomalies.
    • The losses cannot be easily compared because they are squared so the range of them is big.

(3) Huber Loss:

  • can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between model's predictions and train or test data compared with delta which you set. *Memos:
    • delta is 1.0 basically.
    • Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.
  • 's formula. *The 1st one is L2 Loss-like one and the 2nd one is L1 Loss-like one: Image description
  • is used for a regression model.
  • is HuberLoss() in PyTorch. *My post explains HuberLoss().
  • with delta of 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch.
  • 's pros:
    • It's less sensitive to outliers and anomalies.
    • All losses can be differentiable.
    • The losses can be more easily compared than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.
  • 's cons:
    • The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.

(4) BCE(Binary Cross Entropy) Loss:

  • can compute the mean(average) of the sum of the losses(differences) between model's binary predictions and binary train or test data.
  • s' formula: Image description
  • is used for Binary Classification in Computer Vision: *Memos:
    • Binary Classification is the technology to classify data into two classes.
    • Computer Vision is the technology which enables a computer to understand objects.
  • is also called Binary Cross Entropy or Log(Logarithmic) Loss.
  • is BCELoss() in PyTorch: *Memos:
    • My post explains BCELoss().
    • Basically, Sigmoid is applied before BCE Loss: *Memos:

(5) Cross Entropy Loss:

  • can compute the mean(average) of the sum of the losses(differences) between model's predictions and train or test data:
  • s' formula: Image description
  • is used for Multiclass Classification in Computer Vision. *Multiclass Classification is the technology to classify data into multiple classes.
  • is CrossEntropyLoss() in PyTorch. *My post explains CrossEntropyLoss().
  • s' code from scratch in PyTorch:
import torch

y_pred = torch.tensor([7.4, 2.8, -0.6])
y_train = torch.tensor([3.9, -5.1, 9.3])

def cross_entropy(y_pred, y_train):
    return -torch.sum(y_train * torch.log(y_pred))
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)

y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])

print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(12.2420)
Enter fullscreen mode Exit fullscreen mode
  • s' code with mean from scratch in PyTorch:
import torch

y_pred = torch.tensor([7.4, 2.8, -0.6])
y_train = torch.tensor([3.9, -5.1, 9.3])

def cross_entropy(y_pred, y_train):               # ↓ ↓ mean ↓ ↓
    return (-torch.sum(y_train * torch.log(y_pred))) / y_pred.ndim
print(cross_entropy(y_pred.softmax(dim=0), y_train.softmax(dim=0)))
# tensor(7.9744)

y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])
y_train = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])

print(cross_entropy(y_pred.softmax(dim=1), y_train.softmax(dim=1)))
# tensor(6.1210)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)