Transfer Learning for Image Classification – Part1

Artificial Intelligence

One of the most useful and emerging applications in the ML domain nowadays is using the transfer learning technique; it provides high portability between different frameworks and platforms.

Once you’ve trained a neural network, what you get is a set of trained hyperparameters’ values. For example, LeNet-5 has 60k parameter values, AlexNet has 60 million, and VGG- 16 has about 138 million parameters. These architectures are trained using anything from 1,000 to millions of images and typically have very deep architectures, having hundreds of layers that contribute toward so many hyperparameters.

There are many open source community guys or even tech giants who have made those pretrained models publicly available for research (and also industry) so that they can be restored and reused to solve similar problems. For example, suppose we want to classify new images into one of 1,000 classes in the case of AlexNet and 10 for LeNet-5. We typically do not need to deal with so many parameters but only a few selected ones (we will see an example soon).

In short, we do not need to train such a deep network from scratch, but we reuse the existing pre-trained model; still, we manage to achieve acceptable classification accuracy. More technically, we can use the weights of that pre-trained model as a feature extractor, or we can just initialize our architecture with it and then fine-tune them to our new task.

In this regard, while using the TL technique to solve your own problem, there might be three options available:

  • Use a Deep CNN as a fixed feature extractor: We can reuse a pre-trained ImageNet having a fully connected layer by removing the output layer if we are no longer interested in the 1,000 categories it has. This way, we can treat all other layers, as a feature extractor. Even once you have extracted the features using the pre-trained model, you can feed these features to any linear classifier, such as the softmax classifier, or even linear SVM!
  • Fine-tune the Deep CNN: Trying to fine-tune the whole network, or even most of the layers, may result in overfitting. Therefore, with some extra effort to fine-tune the pre-trained weights on your new task using backpropagation.
  • Reuse pre-trained models with checkpointing: The third widely used scenario is to download checkpoints that people have made available on the internet. You may go for this scenario if you do not have big computational power to train the model from scratch, so you just initialize the model with the released checkpoints and then do a little fine-tuning.

Now at this point, you may have an interesting question come to mind: what is the difference between traditional ML and ML using transfer learning? Well, in traditional ML, you do not transfer any knowledge or representations to any other task, which is not the case in transfer learning.

Unlike traditional machine learning, the source and target task or domains do not have to come from the same distribution, but they have to be similar. Moreover, you can use transfer learning in case of fewer training samples or if you do not have the necessary computational power.

References:

Java Deep Learning Projects

by Md. Rezaul Karim