LayerNorm in PyTorch

#python #pytorch #layernorm #layernormalization

*Memos:

My post explains Layer Normalization.
My post explains BatchNorm1d().
My post explains BatchNorm2d().
My post explains BatchNorm3d().
My post explains requires_grad.

LayerNorm() can get the 1D or more D tensor of the zero or more elements computed by Layer Normalization from the 1D or more D tensor of zero or more elements as shown below:

*Memos:

The 1st argument for initialization is normalized_shape(Required-Type:int, tuple or list of int or torch.Size). *It must be 0 <= x.
The 2nd argument for initialization is eps(Optional-Default:1e-05-Type:float).
The 3rd argument for initialization is elementwise_affine=True(Optional-Default:True-Type:bool).
The 4th argument for initialization is bias(Optional-Default:True-Type:bool). *My post explains bias argument.
The 5th argument for initialization is device(Optional-Default:None-Type:str, int or device()): *Memos:
- If it's None, get_default_device() is used. *My post explains get_default_device() and set_default_device().
- device= can be omitted.
- My post explains device argument.
The 6th argument for initialization is dtype(Optional-Default:None-Type:dtype): *Memos:
- If it's None, get_default_dtype() is used. *My post explains get_default_dtype() and set_default_dtype().
- dtype= can be omitted.
- My post explains dtype argument.
The 1st argument is input(Required-Type:tensor of float): *Memos:
- It must be the 1D or more D tensor of zero or more elements.
- The number of the elements of the deepest dimension must be same as normalized_shape.
- Its device and dtype must be same as LayerNorm()'s.
- The tensor's requires_grad which is False by default is set to True by LayerNorm().
layernorm1.device and layernorm1.dtype don't work.

import torch
from torch import nn

tensor1 = torch.tensor([8., -3., 0., 1., 5., -2.])

tensor1.requires_grad
# False

layernorm1 = nn.LayerNorm(normalized_shape=6)
tensor2 = layernorm1(input=tensor1)
tensor2
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

tensor2.requires_grad
# True

layernorm1
# LayerNorm((6,), eps=1e-05, elementwise_affine=True)

layernorm1.normalized_shape
# (6,)

layernorm1.eps
# 1e-05

layernorm1.elementwise_affine 
# True

layernorm1.bias
# Parameter containing:
# tensor([0., 0., 0., 0., 0., 0.], requires_grad=True)

layernorm1.weight
# Parameter containing:
# tensor([1., 1., 1., 1., 1., 1.], requires_grad=True)

layernorm2 = nn.LayerNorm(normalized_shape=6)
layernorm2(input=tensor2)
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=6, eps=1e-05, 
                         elementwise_affine=True, bias=True,
                         device=None, dtype=None)
layernorm(input=tensor1)
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[8., -3., 0.],
                          [1., 5., -2.]])
layernorm = nn.LayerNorm(normalized_shape=3)
layernorm(input=my_tensor)
# tensor([[1.3641, -1.0051, -0.3590],
#         [-0.1162, 1.2787, -1.1625]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=(2, 3))
layernorm(input=my_tensor)
# tensor([[1.6830, -1.1651, -0.3884],
#         [-0.1295, 0.9062, -0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=my_tensor.size())
layernorm(input=my_tensor)
# tensor([[1.6830, -1.1651, -0.3884],
#         [-0.1295, 0.9062, -0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[8.], [-3.], [0.],
                          [1.], [5.], [-2.]])
layernorm = nn.LayerNorm(normalized_shape=1)
layernorm(input=my_tensor)
# tensor([[0.], [0.], [0.], [0.], [0.], [0.]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=(6, 1))
layernorm(input=my_tensor)
# tensor([[1.6830], [-1.1651], [-0.3884], [-0.1295], [0.9062], [-0.9062]], #        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=my_tensor.size())
layernorm(input=my_tensor)
# tensor([[1.6830], [-1.1651], [-0.3884], [-0.1295], [0.9062], [-0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[[8., -3., 0.],
                           [1., 5., -2.]]])
layernorm = nn.LayerNorm(normalized_shape=3)
layernorm(input=my_tensor)
# tensor([[[1.3641, -1.0051, -0.3590],
#          [-0.1162, 1.2787, -1.1625]]],
#        grad_fn=<NativeLayerNormBackward0>)

DEV Community

LayerNorm in PyTorch

Top comments (0)

Read next

Diagram-as-Code: Creating Dynamic and Interactive Documentation for Visual Content

Building a video insights generator using Gemini Flash

LangGraph State Machines: Managing Complex Agent Task Flows in Production

Mastering REST API Best Practices in Python 🐍