Madhav Ganesan

Posted on Nov 11, 2024

Deep Learning Essentials

#deeplearning #programming #python #beginners

It is a subset of machine learning that focuses on using neural networks with many layers to model and understand complex patterns in data.

Neural network is a type of machine learning algorithm that are designed to learn from data by adjusting the weights of connections between neurons based on the errors in their predictions.

Neuron

The fundamental unit of a neural network is the artificial neuron, often just called a neuron. An artificial neuron is inspired by the biological neurons in the human brain and is responsible for performing a small, specific computation in the network.

1) Each neuron receives one or more inputs, processes them (often by applying a mathematical function), and then produces an output.

2) The neuron typically applies a weighted sum of its inputs followed by an activation function to introduce non-linearity. The output of this function is then passed on to the next layer of the network or serves as the final output if it's in the output layer.

Perceptron (Single layer network)

Inputs: The perceptron receives several inputs, each represented by a floating-point number.

Weights: Each input is multiplied by a corresponding weight, which is also a floating-point number. The weight determines the importance of the input in the decision-making process.

Summation: The weighted inputs are then summed together to produce a single value.

Threshold (or Bias): The perceptron compares the result of the summation to a threshold value

Output:

If the summation is greater than 0 (or the threshold), the perceptron outputs +1 (or 1 in some versions).

If the summation is less than or equal to 0, the perceptron outputs -1 (or 0 in some versions).

(Note: Perceptrons are limited to solving problems that are linearly separable, meaning they can only classify data that can be separated by a straight line)

Most interesting problems, and even some very simple ones, were provably beyond the ability of a perceptron to solve.This period, which lasted roughly between the 1970s and 1990s, was called the AI winter

AI winter

This period was marked by disappointment with early AI technologies like expert systems, which struggled with scalability and real-world application. As a result, funding from governments and organizations dried up, and research in AI slowed down significantly.

Modern Neural Network after AI winter

1) Change 1:
It is the addition of an extra input called the bias. Unlike the other inputs, the bias is not tied to any external data or output from previous neurons.

The bias is a constant value that is directly added to the sum of the weighted inputs. It acts as a separate parameter that each neuron has, and it helps adjust the output independently of the input values.

2) Change 2:
Instead of just comparing the sum to a threshold and outputting -1 or 1, we can pass the sum (including the bias) through a mathematical function. This function will output a new floating-point value that can be anywhere within a certain range

Activation/Mathematical/Transfer Function
It determines how "active" the neuron will be based on the inputs it receives. Many activation functions introduce non-linearity, allowing the network to learn non-linear relationships, which is crucial for solving more complex problems.
Ex.

Sigmoid Function: Outputs values between 0 and 1. Useful for binary classification problems.

Tanh (Hyperbolic Tangent) Function: Outputs values between -1 and 1. It’s similar to the sigmoid but centered at 0.

ReLU (Rectified Linear Unit): Outputs the input if it's positive, otherwise 0.

Leaky ReLU: Similar to ReLU, but allows a small, non-zero gradient when the input is negative, helping to avoid the "dying ReLU" problem.

Types of activation functions:

1) Straight line functions

a. Identity Function:
The identity function is a straight-line function where the output is exactly equal to the input.
f(x)=x

b. Linear Functions:
A linear function is any function that can be represented as a straight line.
f(x) = mx + b

2) Step Functions

a. Stair-Step Function:
A stair-step function consists of multiple linear segments with abrupt changes at certain input values. It’s characterized by discrete jumps rather than a smooth curve.
Ex.
A function that outputs 0 for inputs between 0 and just less than 0.2, 0.2 for inputs from 0.2 to just less than 0.4, and so on.

b. Unit Step Function:
Outputs 0 for input values less than a threshold and 1 for input values equal to or greater than the threshold.

c. Heaviside Function:

3) Piecewise Linear Functions

a. ReLU (Rectified Linear Unit)

Function definition:
For x≥0:
f(x)=x
For x<0:
f(x)=0

b. Leaky ReLU

Function definition:
For x≥0:
f(x)=x
For x<0:
f(x)=αx (where α is a small constant, e.g., 0.01)

c. Parametric ReLU (PReLU)
Function Definition:
For x≥0:
f(x)=x
For x<0:
f(x)=αx (where α is a learnable parameter)

4) Smooth Activation Functions

a. Softplus Function
It is a smooth approximation of the ReLU function. It addresses the sharp transition at zero by providing a continuous and differentiable alternative.
Softplus(x) = ln(1+e^x)

b. Sigmoid Function
The sigmoid function squashes input values into a range between 0 and 1
σ(x)= 1 / 1+e^−x

c. Hyperbolic Tangent (tanh) Function
The tanh function is similar to the sigmoid but squashes input values into the range [−1,1]. It’s centered around zero, making it useful for normalizing data.
tanh(x)=e^x+ e^−x
/e^x−e^−x

Softmax function

The softmax function is a crucial component in classification tasks within neural networks, particularly when the goal is to predict probabilities for multiple classes.

Softmax converts the raw output scores (often referred to as logits) from the output neurons into a probability distribution over the classes. Each output value is transformed into a probability that sums up to 1 across all classes.

Multi layer neural network

1) Feed forward neural network
A feed-forward network is a type of artificial neural network where the connections between the neurons do not form cycles. In other words, the data flows in one direction, from input to output, without looping back.

Structure:
A feed-forward network is organized into layers: an input layer, one or more hidden layers, and an output layer.

Each neuron receives inputs from the previous layer, applies weights to these inputs, sums them up, adds a bias term, and passes the result through an activation function to produce an output.

Types of Feed-Forward Networks:

Single-Layer Perceptron:
The simplest form of a feed-forward network with only an input layer and an output layer
Ex. Used for binary classification problems where data is linearly separable.

Multi-Layer Perceptron (MLP):
It contains one or more hidden layers between the input and output layers.
Ex. It is used in tasks such as classification, regression, and function approximation.

Radial Basis Function (RBF) Network
It uses radial basis functions as activation functions in the hidden layer.
Ex. It is used for function approximation and pattern recognition.

Applications:
Image Recognition
Speech Recognition
Medical Diagnosis

Network Depth

It is the number of layers through which data passes from the input to the output. These are the layers between the input layer and the output layer (exluding input layer). The depth of the network is determined by the number of these hidden layers.

Fully Connected Layer(FC/linear/dense):

These are set of neurons that each receive an input from every neuron on the previous layer. If a layer is made up
of only dense layers, it is sometimes called a fully connected network.

Output Shapes in Neural Networks

Zero-Dimensional Array
Ex.
If a neural network layer has only one neuron, its output is a single scalar value. Mathematically, this output can be represented as a zero-dimensional array.

One-Dimensional Array (1D Array)
Ex.
When a layer in a neural network has multiple neurons, the output can be described as a list or vector of values. For instance, if a layer contains 12 neurons, the output is a 1D array with 12 elements.

(Note: No matter how big or complicated our neural network is, if it has no activation functions and they are linear functions say addition,subtraction etc; then it will always be equivalent to a single neuron.)

Tensor

A general term used for an array of numbers arranged in a box-like shape with any number of dimensions. It encompasses one-dimensional (vector), two-dimensional (matrix), three-dimensional (volume), and higher-dimensional arrays.

High-Level Overview of Training Neural Networks

Training neural networks involves adjusting the network's weights to minimize errors in predictions. This is done through a process of iteratively updating the network's parameters to reduce a cost or loss function

Autoencoder

They are a type of neural network used for unsupervised learning. The key idea is to compress the input into a lower-dimensional code and then reconstruct the original input from this code.

Structure

Encoder:
This part compresses the input data into a compact representation.
Example: For an image, the encoder might reduce its dimensions from, say, 128x128 pixels to a smaller vector, like 32-dimensional

Decoder:
This part reconstructs the original input data from the compressed representation.
Example: The decoder would take the 32-dimensional vector and try to recreate the 128x128 pixel image.

Training Process

They are trained to minimize the difference between the input and the reconstructed output. This is usually done using a loss function, such as Mean Squared Error (MSE) for continuous data or binary cross-entropy for binary data. The goal is to adjust the weights of the network so that the reconstruction is as close as possible to the original input.

Variants:

1) Denoising Autoencoders
2) Variational Autoencoders
3) Sparse Autoencoders

Types of compression:

1) Lossless:
It is a type of data compression where the original data can be perfectly reconstructed from the compressed data. This means no information is lost during the compression process, and the decompressed data is identical to the original.
Algorithms: Use methods like entropy encoding and dictionary-based techniques. Examples include:Huffman Coding: Encodes frequently occurring symbols with shorter codes and less frequent symbols with longer codes.Lempel-Ziv-Welch (LZW): Builds a dictionary of sequences from the data and uses shorter codes for common sequences.Run-Length Encoding (RLE): Compresses sequences of repeated characters by storing the character and its count.Ex. PNG,FLAC,ZIP

2) Lossy:
It reduces file size by removing some of the data, often in a way that is less noticeable to the human senses but results in some loss of fidelity. The goal is to achieve a significant reduction in file size while maintaining acceptable quality for the intended use.
Ex. JPEG,H.264 or HEVC, MP3Transform Coding: Converts data into a different domain (like frequency domain) and quantizes it. Examples include:Discrete Cosine Transform (DCT): Used in JPEG image compression.Discrete Wavelet Transform (DWT): Used in JPEG 2000.

Application:

1) Dimensionality Reduction
2) Denoising

Difference

Optimizer: Adjusts weights to minimize the loss function.
Loss Function: Measures how well the model's predictions match the actual values.
Activation Function: Adds non-linearity to the model, enabling it to learn complex patterns.

Stay Connected!
If you enjoyed this post, don’t forget to follow me on social media for more updates and insights:

Twitter: madhavganesan
Instagram: madhavganesan
LinkedIn: madhavganesan

DEV Community