Demand for Natural Language Processing (NLP) technology is rapidly ascending to a wide range of vertical markets, from healthcare to smartphones. With the NLP global market expected to grow to $341.7 billion by 2030, Intel Labs could not be more obsessed with making sure that practitioners have the resources to sustain this trajectory.
We are especially focused on developing advanced language technologies that will hasten NLP deployments. Along with our partners, we are working both internally and externally to develop and test these technologies and make sure they are optimized to run on Intel hardware. We are also working closely with open-source communities to accelerate the evolution of these solutions, as well as ensure that they are available to a wide range of users—not only the tech elite.
Our journey begins with addressing the most prominent challenges that data scientists and developers currently face when it comes to NLP deployments. These challenges include:
- Dealing with very large, costly, and inefficient models
- Wrangling large amounts of data to solve specific NLP tasks
- The limited—albeit improving—limitations of machine understanding
Transformer Model Optimization Technology
Much of our work in NLP has focused on the optimization of transformer models which are frequently used in NLP for text categorization, question-answering, summarization, etc. As dominant AI workloads, transformers have many applications in finance, law, healthcare, and other mainstream industries and services.
Transformers can be trained on large amounts of raw text in a self-supervised fashion—which means that humans need not label vast amounts of data. They are also adept at discovering language patterns in text such as syntax and semantics. They use self-attention to calculate representations of inputs and outputs and can generalize across a wide range of settings and languages. However, the growing size of transformer models has made them unsustainable and costly.
As a solution, Intel Labs is exploring cutting-edge approaches that take advantage of existing generic pre-trained models and then fine-tuning the model according to specific inference tasks. This would have a major impact on the computational resources and expertise required to deploy NLP applications.
Over the last few years, we've made some significant breakthroughs, optimizing large models for specific tasks for Intel hardware architectures by applying various methods, including quantization, sparsity dynamic inferencing, and knowledge distillation. These optimization tools are all available via the Optimum open-source library, which we have built in partnership with Hugging Face, an open-source start-up that develops pre-trained machine-learning models for natural language processing.
In our paper, Exploring the Boundaries of Low-Resource BERT Distillation, we show that large pre-trained models can be successfully distilled by very simple and efficient models.
In our paper, Prune Once and for All, we use weight pruning and model distillation to create sparse pre-trained transformer models that can be fine-tuned for a wide variety of tasks while maintaining their sparsity pattern.
Addressing Data Scarcity Through Few-Shot Learning
As more and more organizations strive to exploit knowledge that resides within data to maintain profitability and competitiveness, a problem is emerging. Organizations cannot afford the cost and expertise needed to manually annotate the data for fine-tuning transformer models for specific tasks.
Intel Labs is leveraging transformer technology to facilitate model training in data-scarce environments. One of our most exciting transformer research projects (again, in collaboration with Hugging Face) demonstrates Sentence Transformer Fine-tuning (SetFit) as an effective approach for few-shot text classification.
In this study, SetFit achieves high accuracy with little labeled data including few-shot text classification. SetFit outperformed GPT-3 in seven out of eleven NLP tasks. Our blog and paper describe SetFit's simple and efficient method and present benchmarks that show its competitiveness with other state-of-the art methods.
Language-based Applications
Another goal of Intel Labs is to provide data scientists and end users with state-of-the-art language-based applications that provide accurate and valuable business insights. One such application is our Aspect Based Sentiment Analysis which enables the extraction of opinions towards specific aspects of a product or service. Our system has been used to track COVID-19 discourse on Twitter in North America to generate insights on the effectiveness of public health interventions for COVID-19.
Retrieval, Augmented, and Generation Architecture. In a collaboration between the Intel Software and Advanced Technology Group team and our own NLP researchers, we demonstrated through benchmarking how text generation that is based on neural semantic search can be deconstructed into scalable, CPU-friendly components, which results in a higher query throughput than other hardware solutions. This collaboration has led to the creation of a key Sapphire Rapids hero application; the resulting code framework is planned to be open-sourced as well. The architecture is modal agnostic and scales to many usages. We recently added support to summarization and translation and plan to support text, image, and speech search.
Stable diffusion – Text to Image on Intel HW. We demonstrated end-to-end Stable Diffusion (text-to-image) on Intel Xeon including few-shot learning, INT8 data type, and optimization of knowledge distillation.
Next Steps
Intel Labs will continue to build open-source resources to improve the efficiency and accessibility of NLP optimizations. We will explore other ways to utilize SetFit in unsupervised learning, making it applicable to a broader range of tasks. Finally, we will investigate the impact of linguistic structures on distillation success.
For More Information:
- "Efficient Few-Shot Learning Without Prompts,” Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg. NeurIPS 2022: ENLSP workshop.
- "Fast DistilBERT on CPUs,” Haihao Shen, Ofir Zafrir; Bo Dong, Hengyu Meng, Xinyu Ye, Zhe Wang, Yi Ding, Hanwen Chang, Guy Boudoukh, Moshe Wasserblat. NeurIPS 2022: ENLSP workshop.
- “QuaLA-MiniLM: a Quantized Length Adaptive MiniLM,” Shira Guskin, Moshe Wasserblat, Chang Wang, Haihao Shen. NeurIPS 2022: ENLSP workshop.
- “TangoBERT: Reducing Inference Cost by using Cascaded Architecture,” Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Roy Schwartz. Arxiv 2022
- Transformer Language Models without Positional Encodings Still Learn Positional Information, Peter Izsak, Adi Haviv, Ori Ram, Ofir Press, and Omer Levy. AEMNLP 2022 (Findings)
- “Prune Once for All: Sparse Pre-Trained Language Models,” Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat, NeurIPS 2021: ENLSP workshop.
- “Dynamic-TinyBERT: Boost TinyBERT’s Inference Efficiency by Dynamic Sequence Length,” Shira Guskin, Moshe Wasserblat, Ke Ding, Gyuwan Kim. NeurIPS 2021: ENLSP workshop.
- “How to Train BERT with an Academic Budget,” Peter Izsak, Moshe Berchansky, Omer Levy, Proceedings of EMNLP 2021.
- “Exploring the Boundaries of Low-Resource BERT Distillation,” Moshe Wasserblat, Oren Pereg, Peter Izsak, Proceedings of SustaiNLP 2020: EMNLP Workshop on Simple and Efficient Natural Language Processing.
- “Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction,” Oren Pereg, Daniel Korat, Moshe Wasserblat, Proceedings COLING 2020.
- “Q8BERT: Quantized 8Bit BERT,” Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat, EMC2: 5th Edition Co-located with NeurIPS 2019.
- “Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models,” Peter Izsak, Shira Guskin, Moshe Wasserblat, EMC2: 5th Edition Co-located with NeurIPS 2019.
- “ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System,” Oren Pereg, Daniel Korat, Moshe Wasserblat, Jonathan Mamou, Ido Dagan, Proceedings of 2019 EMNLP: System Demonstrations.
- “Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow,” Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Ido Dagan, Yoav Goldberg, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, Daniel Korat, Proceedings of COLING 2018: System Demonstrations.