Intel® Labs at EMNLP 2021

Highlights:

  • The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) takes place from November 7-11th. The conference is virtual and in person at the Barceló, Bávaro Convention Center, Punta Cana, Dominican Republic.

  • Intel® Labs presents latest research in Natural Language Processing (NLP).

  • Moshe Wasserblat, Group Manager for Intel’s Artificial Intelligence Products and Natural Language Processing and Deep Learning Research, participates in SustainNLP 2021's panel.

BUILT IN - ARTICLE INTRO SECOND COMPONENT

One of the top conferences focusing on natural language processing (NLP) will take place this year from November 7-11th. The EMNLP conference started in 1996 and is organized by the Association of Computer Linguistics special interest group on linguistic data (SIGDAT). 

Intel Labs is pleased to have a significant presence at the conference as a sponsor, publishing several papers on its latest NLP research, as well as participating in workshops. In addition, Intel is a co-contributor of several papers presented at the conference. 

Moshe Wasserblat, Manager for Intel’s Artificial Intelligence Products and Natural Language Processing and Deep Learning Research Group will participate on the panel at the SustainNLP 2021, the  Second Workshop on Simple and Efficient Natural Language Processing. During this panel, he'll share insights on issues surrounding Green AI approaches for creating more efficient and environmentally-sustainable NLP research and practices, an important theme in the industry. 

Following is the complete list of papers and workshops.

Workshops 

November 10, 2021
 

Conference Papers

  • How to Train Bidirectional Encoder Representations from Transformers (BERT) with an Academic Budget
    Peter Izsak, Moshe Berchansky, and Omer Levy

    While large language models à la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce competitive models with BERTBASE on GLUE tasks at a fraction of the original pretraining cost.1
     
  • iFᴀᴄᴇᴛSᴜᴍ: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration
    Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal and Ido Dagan

    We introduce iFACETSUM2, a web application for exploring topical document sets. iFACETSUM integrates interactive summarization with faceted search by providing a novel faceted navigation scheme that yields abstractive summaries for the user’s selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. 

    Fine-grained facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities, and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies, which suggest the usefulness of our approach.
     
  • Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration
    Weiyan Shi, Yu Li, Saurav Sahay, and Zhou Yu

    Despite the recent success of large-scale language models on various downstream NLP tasks, problems repetition and inconsistency persist in dialogue response generation. Previous approaches have attempted to avoid repetition by penalizing the language model’s undesirable behaviors in the loss function. However, these methods focus on token-level information and can lead to incoherent responses and uninterpretable behaviors. 

    To alleviate these issues, we propose to apply reinforcement learning to refine an MLE-based language model without user simulators and distill sentence-level information about repetition, inconsistency, and task relevance through rewards. In addition, to accomplish the dialogue task better, the model learns from human demonstration to imitate intellectual activities such as persuasion and selects the most persuasive responses. 

    Experiments show that our model outperforms previous dialogue models on automatic metrics and human evaluation results on a donation persuasion task. Our experiments also generate more diverse, consistent, and persuasive conversations according to user feedback.

Workshop Papers

  • Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings
    Nicole Beckage, Shachi H Kumar, Saurav Sahay, Ramesh Manuvinakurike

    Understanding parts of longer-duration multi-party meetings and summarizing them to support participants is an emerging problem. In this work, we examine the extent to which human abstractive summaries of the preceding meeting increments (context) can be combined with extractive meeting dialogue to generate abstractive summaries. 

    We find that previous context improves ROUGE scores. Our findings further suggest that contexts begin to outweigh the dialogue. Using key phrase extraction and semantic role labeling (SRL), SRL captures relevant information without overwhelming the model architecture. By compressing the previous contexts by ~70%, we achieve better ROUGE scores over our baseline models. Collectively, these results suggest that context matters, as does how context is presented to the model.

Product and Performance Information

1Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, and Danilo Giampiccolo. 2006. The second-pascal recognizing textual entailment challenge. Proceedings of the Second PASCAL Challenges Workshop on Recognizing Textual Entailment.
2Avinesh, Benjamin Hättasch, Orkan Ozyurt, Carsten Binnig, and Christian M Meyer. 2018. Sherlock: A System for Interactive Summarization of Large Text Collections. Proceedings of the Very Large Data Base (VLDB) - Endowment, 11(12).