Revolutionizing Personal Assistants Through Understanding Actionable Requests in Human-to-Human Interactions

MaryT_Intel · ‎12-20-2019

Intelligent Personal Assistant Apps

Intelligent Personal Assistant Applications (IPAA) are growing in use and becoming essential parts of many people’s lives. IPAAs are designed to help humans with day to day tasks, queries and actions such as initiating a phone call or setting a task reminder.

Most of the popular personal assistant applications today are designed to interact with humans by carrying out commands or answer queries made by a user. The user may convey those commands or queries using natural language speech or text. This type of interaction is referred to as Human to Machine (H2M) interaction. A significant step towards integrating IPAAs further in human lives is enriching them with the ability to understand interpersonal conversation. A new type of IPAAs is aiming to help fulfill user requests that are conveyed in Human to Human (H2H) interactions. These requests often originate from other people interacting with the user (e.g. spouse, team member, friend, etc.), and are transferred through textual communications such as SMS or IM.

A typical example of a human-to-human request may be to meet someone from a specific location at a specific time: “Don’t forget to pick up Noah from school at 4 PM.” In this case, the IPAA’s task is to detect the semantic elements of the request such as who to pick up, from which location and at what time. Finally, the IPAA needs to create a reminder that prompts the user to action. Another example could be a request to make a phone call depending on a given condition: “Call me when you leave work, please.” In this case, the IPAA’s task is to detect the semantic elements: who to call and when, and create a corresponding reminder.

The Intel AI Lab team, in cooperation with Intel Labs, has developed a model for detecting the semantic elements of human to human requests. This model is based on the “intent_extraction” model published as part of Intel’s NLP Architect open source library. The output of this model is handled by the midu application, a personal time management reference app that was developed by Intel’s wearable software team but is no longer commercially available. Midu receives these elements from the human to human request detection through a dedicated API, and further resolve them to the actual places, events, activities, etc. These in turn are used in the application; they are added to the user’s timeline, and triggered as contextual reminders, in accordance with their semantic “meaning” as extracted and resolved through this process. Figures 1 and 2 show the H2H request comprehension process, expressed in the user experience as a timeline entry and reminder.

Figure 1: midu application used to analyze human-to-human textual messages, detect and understand the request and its semantic elements, and create a timeline reminder.

cq5dam.web.1280.1280.jpeg

Figure 2 - Analyzing an outgoing message containing a bring request. Semantic resolution and contextual triggering is performed by Intel’s midu technology.

The Challenges of Understanding Human-to-Human Interactions

The fundamental challenge of IPAAs is to perform semantic elements resolution.
Although there is no off-the-shelf system that is 100% accurate, this challenge is widely addressed by H2M systems. H2H systems also share this challenge. Our work is built on top of H2M industry knowledge and practices in resolving semantic elements. However, understanding H2H requests raises three additional challenges:

The first challenge is to convert the informal language of text messages to formal language. Text messages may include informal language such as acronyms, abbreviations and misspellings, as in the following informal text message:
Plz pick John up B4 U arrive
The second challenge is filtering out bot messages. In some text messaging systems, bots send automatic messages containing advertisements or reminders. Those messages may be falsely detected as requests from other humans, and need to be filtered out.
The third challenge is detecting whether the message is indeed a request to perform an action. Since the vast majority of H2H messages are not requests to perform actions, the potential for falsely detecting action requests is high. Note that this challenge is currently bypassed by existing H2M systems that require the user to add a “wakeup word” before the command/request. The “wakeup word” is usually the system’s name, as in: “OK Google, please call John.” Future H2M systems may want to omit the requirement for a “wakeup word”, in which case they will face the same challenge as H2H systems in trying to detect whether the message is indeed a request/command. An effective approach to overcoming this challenge is to break it down to sub-challenges. Table 1 shows a breakdown list of these sub-challenges along with textual examples:

Sub Challenge	Example	The expected outcome
Past tense	“I just picked John up from school”	No need for action.
Question	“will you send the material?”	No need for action - just to answer a y/n question.
Negation	“please don’t send the material yet”	The request is to not perform an action.
Condition	“please pick John up if it rains”	The pickup request is dependent on the ability to identify the given condition. For example, by extracting the weather forecast for a given location at a given time using a weather API.
Semantic	“don’t forget to bring your thoughts”	“Thoughts” are not tangible therefore there is no need for action.

Table 1 – H2H request-detection challenges

System Architecture and Method

The developed system is designed to overcome the H2H semantic comprehension challenges. The system includes 3 modules, with each module containing one or more blocks. Figure 3 illustrates the system’s architecture.

cq5dam.web.1280.1280.png

Figure 3 – System Architecture

The following is a description of the system’s modules and their functionality:

H2H preprocessing module: The text messages that are the input to this module undergo:
- Text normalization for converting the informal language of text messages to formal language.
- Bot filtering to filter out bot messages.
The text normalization component is based on supervised Neural Machine Translation (NMT) in which the training data comprises pairs of informal text messages and their corresponding formal text messages. The inference stage inputs an informal text message and outputs the predicted formal form of this message.
Semantic elements detection module (also called slot classification): This module’s goal is to detect the main semantic elements of a request: subject, direct object and indirect objects, time and location. This module extracts Part of Speech tags, word embeddings and character embeddings as input to a deep Bidirectional-LSTM neural network classifier.
H2H validation module: This is a post-processing module that is designed to handle the main challenges of H2H comprehension. It verifies that the message is indeed a request to perform an action. This module includes components for validating the tense of the request as well as validating that the request is neither a negation nor conditioned and that it is not a question. In addition, the module includes a component for verifying that the request is semantically valid. This component utilizes Multi-Layer Perceptron (MLP) based Word Sense Disambiguation (WSD) for detecting the meanings of the extracted semantic elements.

Testing Dataset

To test the system, the Intel AI Lab team has assembled a dataset of 500 human-to-human messages. 385 out of the 500 messages include requests to perform actions whereas 115 messages do not include requests to perform actions. The messages were manually generated for the purpose of creating the dataset. The messages were also manually tagged. This dataset can be downloaded from the NLP Architect library, an NLP library we introduced in May 2018. Please note that this is a testing dataset, so in order to train a system, a training dataset should be assembled. Table 2 describes the dataset and its tagging.

Tag Name	Tag Description
Request type	The type of request (e.g. send/update/submit/etc.)
Message	The textual message
Message direction	Message direction: incoming or outgoing message
Valid request	Indicates whether the message contains a request to perform an action (true) or not (false)
Subject head	The head of the subject phrase of the sentence
Subject NP	The noun phrase of the subject of the sentence
Direct object head	The head of the direct object phrase of the sentence
Direct object NP	The noun phrase of the direct object of the sentence
Indirect object head	The head of the indirect object phrase of the sentence
Indirect object NP	The noun phrase of the indirect object of the sentence

Table 2 – The dataset description

Experiments

The system’s evaluation with the above dataset included two sets of tests. The first test aimed to measure the quality of handling the challenges described in Table 1. Meaning, to what extent the model detects messages that include requests to perform actions and filters out messages that do not include such requests. Table 3 shows the request detection evaluation test results.

Request Detection Evaluation
Precision	Recall	F1 score
89.7%	68.7%	77.8%

Table 3 - Request Detection Evaluation Results

We see that 89.7% of the messages that were classified by the system as including requests did in fact include requests to perform an action. This high precision rate is mainly achieved by the system’s ability to detect and filter out messages that are false positives.

The second test aimed to measure the quality of the semantic elements detection (i.e., the slot classification task). The system is configured to detect three main semantic elements: subjects, direct objects and indirect objects. For each of those elements the system extracts the head of the element. Table 4 shows the evaluation results of semantic elements detection in requests.

Request Semantic Elements Detection Evaluation
Semantic Element	Precision[%]	Recall[%]	F1 score[%]
Subject Head	86.9	61.6	72.1
Direct Object Head	88.9	60.5	72
Indirect Object Head	73.8	53.4	61.9
Average	83.2	58.5	68.7

Table 4 - Request Semantic Elements Detection Evaluation Results

Future Work

In this project, the request was extracted from a single message that included the
request predicate, i.e., call, bring, send, etc. For future work, we plan to include the
ability to extract a request from the full context of the H2H conversation. This
will enable the extraction of semantic elements that are related to the request but
are mentioned in other messages during the conversation. For example, a message
to pick up someone from a specific location may be followed by another later
message stating the pickup request time.

Conclusions

A large step towards integrating IPAAs more fully into human everyday lives is enabling them to understand natural human-to-human language. In this work, we focused on understanding human to human requests. Understanding such requests raises various natural language processing challenges. We showed that by mapping the challenges and designing a dedicated model for each challenge, it is possible to achieve high precision in detecting requests. This enables the incorporation of human-to-human request comprehension algorithms in next-generation IPAAs that result in autonomous creation of timeline reminders.

Notices and Disclaimers

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Intel, the Intel Logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.