Transfer learning is used in the domain of machine learning and is very much popular in deep learning, so let’s understand what is this concept and why we have implemented this concept to machines and why it actually works.
The reason why we should get into the concept of transfer learning can be answered through this graph…
The concept of transfer learning is something humans have been using since they came into existence. Here’s a real life scenario , in which we’ve used transfer learning , before riding a motorcycle, the knowledge gained from learning to ride a bicycle was transferred, meaning that the skills acquired from riding a bicycle were instrumental in learning how to ride a motorcycle.
Same could be conveyed for a person who’s learning violin , who had already acquired the knowledge to play a guitar. However, we are now applying this concept to machines. We train machines using transfer learning, allowing them to perform a specific task. Subsequently, we leverage the knowledge gained from that task to tackle related tasks in a sequence, employing transfer learning for each subsequent task.
Transfer Learning In AI
The knowledge of an already trained machine learning model is transferred to a different but closely linked problem throughout transfer learning. For example, if you trained a simple classifier to predict whether an image is of a cat or a dog, you could use the model’s training knowledge to identify other animals , because the model has learned the primitive features of animals via previous training.
Lets get this through a case study….
In any kind of transfer learning we use , like for example we will take into account the various CNN models , so generally the CNN models consists of two parts the convolutional part (consists of conv layers and pooling layers) , the output of the convolutional part is flatten into a 1D tensor and this 1D tensor is then connected to a fully connected neural network layer.
Suppose we have two tasks, A and B, both related to CNN problems. While there is enough data for training task A, the same is not true for task B. In this scenario, we opt to train our ML model using the dataset for task A and then leverage the pretrained weights of the model to predict the output for task B. Essentially, we are employing transfer learning to predict the output of task B using a model trained on task A.
We have a small dataset for training task B and a pre-trained model from the task A dataset. To train the model for task B, we input task B data into the pre-trained model. “Pre-trained” means that we have optimized the weights and biases with respect to model A. We cannot use the exact same model for predicting the output of task B, as the tasks are different. However, the architecture remains the same (due to transfer learning), and some weight values need modification based on task B. The percentage of weight values changed depends on the degree of relatedness between task A and task B. If the tasks are closely related, the percentage change is less; if they are less related, more adjustments are needed , so based upon the adjustment , the ways to do transfer learning are Feature Extraction and Fine Tuning.
WAYS TO DO TRANSFER LEARNING
Feature Extraction
Fine Tuning
We will continue our discussion w.r.t CNN model
- Feature Extraction — When we keep the entire convolutional part freeze , that is we don’t change the weights and bias value for the convolutional part , the training is done to change the weights and bias value of the fully connected neural network part only. This process is employed when the two tasks exhibit close relatedness. For instance, when training for classifying cats versus dogs and subsequently applying the model to classify other animals, the model has learned to extract fundamental and primitive features of animals during the cat versus dog classification task
The pre-trained weights are taken from model VGG16(imported from keras) pretrained on imagenet dataset , we have kept include_top = False , that is training will not be done on convolutional part , after that we have provided a fully connected sequential neural network on which training will be done.
- Fine Tuning — In cases where the two tasks are not closely related, training extends to the last few layers of the convolutional base and the fully connected neural network part, as needed. This approach allows the model to adapt more extensively to the specific requirements of the second task, accommodating the dissimilarity between the two tasks.
Here we write the code such that , for the last block(last few layers of the conv base) we set the training true , so that training happens for last part of the conv base and the fully connected neural network.
Conclusion
In conclusion, a solid understanding of transfer learning is essential for data scientists delving into deep learning. It empowers them to utilize pre-trained models, extract valuable insights from existing data, and tackle intricate problems effectively, especially when faced with resource constraints.
Top comments (0)