Transfer Learning
Humans inherently apply knowledge across tasks. The more the tasks are related, the easier it is to transfer knowledge to a new task. For example, if you know how to speak French, it may be easier for you to learn Spanish, as both are considered part of the same language family.
Transfer learning is a machine learning technique that uses a previously trained model as the starting point for a new, yet related model. Reusing a model saves compute time and helps manage model accuracy.
Real-World Use Cases
Transfer learning is widely used in computer vision and natural language processing (NLP). The following examples show how these technologies can be used to improve health and well-being.
Drug Discovery
Deep learning helps bring new drugs to market more quickly by predicting drug properties, possible interactions, and formulating new compounds.
Mental Health Therapy
Mental health professionals use NLP to augment screening and diagnosis techniques, evaluate therapy effectiveness, and monitor patient progress.
How Transfer Learning Works
The most common transfer learning techniques are feature extraction and fine-tuning. The techniques differ in how the base layer is treated. Feature extraction freezes the base layer; fine-tuning adjusts the base layer.
What Makes a Model Similar?
Often the best strategy for transfer learning is to start with a proven architecture for your network. When selecting a model, consider the following questions.
- How similar is the dataset in terms of categories? (For example, dogs versus cats.)
- How similar is the type of model architecture? (For example, ResNet* and edges.)
- How similar is the type of task? (For example, image classification versus object detection.)
Use Transfer Learning to Predict Wildfires
The Hunting Dinosaurs AI project uses a standard ResNet computer vision model to classify topography images for their likelihood of having bones. This same pretrained model can be used as a starting point with transfer learning to predict wildfires.
What is an Epoch?
An epoch is when the entire dataset is passed through a machine learning algorithm. Data is often divided into batches. If your dataset has 1,000 rows and your batch size is 100, then one epoch requires 10 iterations through the machine learning algorithm.
Select a Pretrained Model
Begin with a ResNet-50 model, which has 50 trainable layers.
Freeze Layers (Feature Extraction)
The model parameters are kept as is for 49 of the 50 trainable layers.
Fine-Tune
The model parameters are kept consistent for batch normalization layers, which have had the inputs standardized. The rest of the layers are trainable.
Add Layers
The classes and the data from the wildfire dataset replaces the last layer in the model.
Train
Train your model for a few epochs. Normally, you would train a model for a larger number of epochs, but because you are not training all the layers, fewer epochs are required. This is the major advantage of transfer learning.
Intel® AI & Machine Learning Portfolio
AI use cases and workloads continue to grow and diversify across vision, speech, recommender systems, and more. Intel offers an unparalleled AI development and deployment ecosystem combined with a heterogeneous portfolio of hardware optimized for AI. Intel's goal is to make it as seamless as possible for every developer, data scientist, researcher, and data engineer to accelerate their AI journey from the edge to the cloud.
AI Tools
Data scientists, AI developers, and researchers can get familiar Python* tools and frameworks to accelerate end-to-end data science and analytics pipelines on Intel architecture. The components are built using oneAPI libraries for low-level compute optimizations. These tools maximize performance from preprocessing through machine learning, and provides interoperability for efficient model development.
Using these tools, you can:
- Deliver high-performance, deep learning training and integrate fast inference into your AI development workflow with Intel-optimized, deep learning frameworks for TensorFlow* and PyTorch*, as well as pretrained models and low-precision tools.
- Achieve drop-in acceleration for data preprocessing and machine learning workflows with compute-intensive Python* packages, Modin*, scikit-learn*, and XGBoost, optimized for Intel hardware.
- Gain direct access to analytics and AI optimizations from Intel to help ensure your software works together seamlessly.
Transfer Learning with 4th Gen Intel® Xeon® Scalable Processors
Today, there are over 100 million Intel® Xeon® processors installed in the market that serve as a staple compute platform for most enterprises and cloud providers. Rather than training a model from scratch, AI developers see significant time and cost advantages by fine tuning an already trained model on their Intel Xeon Scalable processors.
With 4th gen Intel Xeon Scalable processors, you can:
- Achieve up to 10x higher PyTorch real-time inference and training performance with built-in Intel® Advanced Matrix Extension (Intel® AMX) accelerators.
- Fine-tune a natural language processing (NLP) model, such as DistilBERT, in less than four minutes, which can reduce or eliminate the need for a dedicated accelerator.
4th Gen Intel Xeon Scalable Performance Data for Intel® AI Data Center Products
10x PyTorch Performance with 4th Gen Intel Xeon Scalable Processor
Performance Data for Transfer Learning in Minutes with 4th Gen Intel Xeon Scalable Processor
Hugging Face*
This is a very large transformer community for research, data scientists, and machine learning engineers. Intel and Hugging Face* collaborated to democratize AI and machine learning to build a state-of-the-art solution to train, fine-tune, and predict with transformers.
Intel works with Hugging Face to bring the latest innovations of Intel Xeon processors and Intel AI software to the transformer community, through open source integration and consistent integrated experiences.
With Intel-optimized Hugging Face, you can:
- Train and fine-tune transformer models in a single or distributed cluster that includes Intel Xeon processors and Intel® Gaudi® platforms.
- Automatically perform hyperparameter optimization for training and fine-tuning with the integrated SigOpt HPO feature in Hugging Face transformers.
- Quantize, prune, and distill transformer models after fine-tuning using Optimum* for Intel. It provides much better performance without sacrificing accuracy for inference on Intel platforms.
- Fine-tune downstream tasks with Intel-optimized pretrained models.
Developer Resources from Intel and Hugging Face
Scale Transformer Model Performance with Intel AI
Habana* Labs and Hugging Face Partner to Accelerate Transformer Model Training
Recommended Resources
SetFit: Efficient Few-Shot Learning without Prompts
Learn how SetFit works on Sentence Transformer models with data that have few to no labels.
Faster, Easier Optimization with Intel® Neural Compressor
See what is possible with the Intel® Neural Compressor, including knowledge distillation and a student model that learns features from a teacher model.
GitHub* Repository for Transfer Learning
Try out transfer learning with Intel® AI Reference Models.