Natural Language Processing (NLP) Research: Enabling Faster Deployments

Highlights:

  • Natural Language Processing (NLP) use cases are exploding across a wide range of markets, but face challenges due to the growing size of transformer models.

  • Intel Labs is developing algorithmic optimizations for large transformer models that will help sustain the growth of NLP. We are leveraging few-shot learning to facilitate model training in data-scarce environments.

  • Our goal is to provide data scientist and end-users with state-of-the-art language-based applications will provide accurate and valuable business insights.

author-image

By

Demand for Natural Language Processing (NLP) technology is rapidly ascending to a wide range of vertical markets, from healthcare to smartphones. With the NLP global market expected to grow to $341.7 billion by 2030, Intel Labs could not be more obsessed with making sure that practitioners have the resources to sustain this trajectory.

We are especially focused on developing advanced language technologies that will hasten NLP deployments. Along with our partners, we are working both internally and externally to develop and test these technologies and make sure they are optimized to run on Intel hardware. We are also working closely with open-source communities to accelerate the evolution of these solutions, as well as ensure that they are available to a wide range of users—not only the tech elite.

Our journey begins with addressing the most prominent challenges that data scientists and developers currently face when it comes to NLP deployments. These challenges include:

  1. Dealing with very large, costly, and inefficient models
  2. Wrangling large amounts of data to solve specific NLP tasks
  3. The limited—albeit improving—limitations of machine understanding 

Transformer Model Optimization Technology

Much of our work in NLP has focused on the optimization of transformer models which are frequently used in NLP for text categorization, question-answering, summarization, etc. As dominant AI workloads, transformers have many applications in finance, law, healthcare, and other mainstream industries and services.

Transformers can be trained on large amounts of raw text in a self-supervised fashion—which means that humans need not label vast amounts of data. They are also adept at discovering language patterns in text such as syntax and semantics. They use self-attention to calculate representations of inputs and outputs and can generalize across a wide range of settings and languages. However, the growing size of transformer models has made them unsustainable and costly.

As a solution, Intel Labs is exploring cutting-edge approaches that take advantage of existing generic pre-trained models and then fine-tuning the model according to specific inference tasks. This would have a major impact on the computational resources and expertise required to deploy NLP applications.

Over the last few years, we've made some significant breakthroughs, optimizing large models for specific tasks for Intel hardware architectures by applying various methods, including quantization, sparsity dynamic inferencing, and knowledge distillation. These optimization tools are all available via the Optimum open-source library, which we have built in partnership with Hugging Face, an open-source start-up that develops pre-trained machine-learning models for natural language processing.

In our paper, Exploring the Boundaries of Low-Resource BERT Distillation, we show that large pre-trained models can be successfully distilled by very simple and efficient models.

In our paper, Prune Once and for All, we use weight pruning and model distillation to create sparse pre-trained transformer models that can be fine-tuned for a wide variety of tasks while maintaining their sparsity pattern.

Addressing Data Scarcity Through Few-Shot Learning

As more and more organizations strive to exploit knowledge that resides within data to maintain profitability and competitiveness, a problem is emerging. Organizations cannot afford the cost and expertise needed to manually annotate the data for fine-tuning transformer models for specific tasks.

Intel Labs is leveraging transformer technology to facilitate model training in data-scarce environments.  One of our most exciting transformer research projects (again, in collaboration with Hugging Face) demonstrates Sentence Transformer Fine-tuning (SetFit) as an effective approach for few-shot text classification.

In this study, SetFit achieves high accuracy with little labeled data including few-shot text classification.  SetFit outperformed GPT-3 in seven out of eleven NLP tasks. Our blog and paper describe SetFit's simple and efficient method and present benchmarks that show its competitiveness with other state-of-the art methods.

Language-based Applications

Another goal of Intel Labs is to provide data scientists and end users with state-of-the-art language-based applications that provide accurate and valuable business insights. One such application is our Aspect Based Sentiment Analysis which enables the extraction of opinions towards specific aspects of a product or service. Our system has been used to track COVID-19 discourse on Twitter in North America to generate insights on the effectiveness of public health interventions for COVID-19.

Retrieval, Augmented, and Generation Architecture. In a collaboration between the Intel Software and Advanced Technology Group team and our own NLP researchers, we demonstrated through benchmarking how text generation that is based on neural semantic search can be deconstructed into scalable, CPU-friendly components, which results in a higher query throughput than other hardware solutions. This collaboration has led to the creation of a key Sapphire Rapids hero application; the resulting code framework is planned to be open-sourced as well. The architecture is modal agnostic and scales to many usages. We recently added support to summarization and translation and plan to support text, image, and speech search.

Stable diffusion – Text to Image on Intel HW. We demonstrated end-to-end Stable Diffusion (text-to-image) on Intel Xeon including few-shot learning, INT8 data type, and optimization of knowledge distillation.

Next Steps

Intel Labs will continue to build open-source resources to improve the efficiency and accessibility of NLP optimizations. We will explore other ways to utilize SetFit in unsupervised learning, making it applicable to a broader range of tasks. Finally, we will investigate the impact of linguistic structures on distillation success.

For More Information: