As part of Intel’s efforts to advance human language technology used for artificial intelligence (AI) applications, the company is presenting some of its latest research at this week’s 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), held online from June 6-11. The annual conference presents the latest research on natural language processing (NLP), which focuses on the computational aspects of humans and computer interaction using natural language.
This year, research from Intel Labs’ work will be presented at one of the conference workshops, and two other research papers will be presented during the main conference. The work featured includes advances in deep learning technology specifically related to improving Natural Language Understanding (NLU) data annotation using interactive learning and Human and AI Collaboration methods, controllable Chatbot implementation, and cross-document event coreference resolution for NLP applications.
9:00-10:00 am: “Semi-supervised Interactive Intent Labeling,” by Saurav Sahay, Eda Okur, Nagib Hakim, and Lama Nachman, Intel Labs
The workshop will focus on the cooperation between humans and computers within the area of NLP, covering a wide range of tasks and applications. Some of the areas of evaluation include information extraction, information retrieval and text mining, machine translation, dialog systems, question answering, language generation, summarization, model interpretability, evaluation, fairness, and ethics.
Intel’s paper will be presented during the workshop and features information on building the Natural Language Understanding (NLU) modules of task-oriented Spoken Dialogue Systems (SDS). The authors will showcase a semi-supervised Intent Modeling and Annotation system where SDS developers can interactively label and augment training data from unlabeled utterance corpora using advanced clustering for representation learning and visual labeling methods. The paper also looks at the effect of data augmentation using Paraphrasing models and develops a semantic minority class oversampling method for handling class imbalance. Finally, the paper extracts the learnt utterance embeddings from the clustering model and plots the data to interactively bulk label the samples, significantly reducing the time and effort for data labeling of the entire dataset.
Main Conference Papers
This paper proposes an innovative framework to train chatbots with controllable factors to possess human-like intentions. The framework includes a guiding chatbot and an interlocutor model that plays the role of humans. The guiding chatbot is assigned an intention and learns to induce the interlocutor to reply with responses matching the intention, for example, long responses, joyful responses, responses with specific words, etc. Controllable Intentions are induced using policy gradient Reinforcement Learning method.
The examined framework uses three experimental setups and evaluates the guiding chatbot with four different metrics on the empathetic dialogues dataset to demonstrate flexibility and performance advantages. Additionally, trials were performed with human interlocutors to substantiate the guiding chatbot’s effectiveness in influencing the responses of humans to a certain extent. Code will be made available to the public.
Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small while annotating only modestly sized clusters of documents belonging to the same topic. To complement these resources and enhance future research, Intel presents Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia, where coreference links are not restricted within predefined topics. The paper applies this methodology to the English Wikipedia and extracts a large-scale WECEng dataset. The model is suitably efficient and outperforms previously published state-of-the-art results for the task.