# Conversational Agents with NLP Architect by Intel® AI Lab for AI Applications at the Edge

Published: 07/31/2019

Last Updated: 07/31/2019

AI Applications at the Edge — so what does that truly mean? When you’re developing a product or solution that requires AI or machine learning (ML), you generally have two options on where you can deploy your ML model for inference. If your system has no restrictions on network connections to the cloud (or to your own data center), then you can deploy your AI models to some remote server and you can perform inference for your AI application remotely. However, there are a few use cases where a cloud or data center hosted solution is not practical or feasible.

Note If you are unfamiliar with the basics of AI or ML, Intel® AI Developer Program provides material and courses to get up to speed.

Natural language processing (NLP) is an aspect of AI that can cover several topics, but today we are going to introduce intent extraction and show how NLP Architect uses it to process text in Natural Language. NLP Architect is a Python* framework, but don’t worry if you do not know or use Python, because we will show you how to execute NLP Architect in server mode to allow you to access it using any programming language that understands HTTP. At the end of this article, we will present a pure Java* 9 client application that will invoke the NLP Architect server and get a response within a fraction of a second.

## The Edge and Performing AI at the Edge

Why perform AI at the Edge? Let’s first take the case for autonomous driving. In an autonomous vehicle there are an array of sensors that must be used simultaneously in order to allow the vehicle to drive safely. Sensors such as 360-degree cameras, Light Detection and Ranging (LiDAR), RADAR, GPS, the speedometer, infrared sensors, and ultrasonic sensors, to name a few. Time is a critical factor for this sensor data, because the right data analyzed at the wrong time is effectively the wrong data. Therefore, such computation (and any inference) must be performed locally on the edge (in this case, the car) and you cannot expect the vehicle to constantly have a network connection in order to allow the vehicle to operate autonomously.

Now, consider the case for smart parking in the next generation of smart cities. If a city manager employs the use of hundreds (or even thousands) of cameras within a city to determine if a parking space is available, then the cloud or data center would have to process terabytes of video data per second in order to perform the analysis necessary to determine if (and when) a parking spot is available. So, in this case, the edge is the pole where the camera is mounted, or an on-premises server (on the parking lot’s premises, that is) that has local access to the video footage.

Now think about the case where the data itself is secure, private, or is regulated. For the past few decades in the US, any and all medical patient data is regulated by the HIPAA law, which guarantees the security and privacy of a patient’s data. Therefore, if you want to apply any type of AI in the medical industry, then processing the data locally at the edge (that is, within the hospital room itself or within the urgent care facility) can greatly mitigate any risk of violating HIPAA compliance.

Figure 1. Depending on your use case, the IoT Edge can be almost anywhere, but by definition, it is not the cloud nor the data center.

Now, there are several companies that offer cloud-based speech recognition and NLP tasks as a cloud-based service. However, what if you wanted to create a voice interface for either a desktop application or for an IoT device, and you do not want to pay a third-party service indefinitely during the life of your product? What if you wanted to interact with an application via voice, but your machine has little or no connectivity because it is in a remote area? What if you needed to voice-enable an application that works with sensitive or personal data, such as a hospital with medical data or a bank with financial data? All of these situations are real-world scenarios where using a cloud-based speech recognition service is neither practical nor pragmatic.

## NLP Architect and Intent Extraction

NLP Architect is an open source Python library that allows developers to leverage deep learning frameworks such as TensorFlow* and DyNet to support a wide range of NLP tasks, such as:

Now, the purpose of this article is to show developers how to leverage NLP Architect to build a conversational agent that could function completely offline—without any need to access additional resources from the cloud or the data center. Therefore, the capability of NLP Architect that is most useful for this task is intent extraction. Figure 2 shows three practical examples of intent extraction and also illustrates the concept of slot tagging.

Figure 2. The goal for intent extraction is to take unstructured, raw text and to determine the intention of what the user wants to say or do.

As you can already understand, there are a myriad of ways to ask for the weather, in some cases, you can use the keyword weather, but it is also perfectly valid to even say a phrase like, Is it going to be hot in Miami next week?

NLP Architect not only provides intent extraction, but it also provides slot tagging. Slot tagging is important because after you have determined the intent of the user, in some cases, you might need additional information in order to complete the task or the request that the user wants. For instance, if the user makes the statement, How is the weather in Portland?, the system should not reply with the response, In what city would you like to know the forecast? Such a response indicates that the system understood the intent (Get Weather) but it unfortunately failed to understand that the user has already provided the location within their request. Therefore, as you can see, when the system requests information that was already provided, it is a poor user experience. However, with slot tagging, after we have identified the intent (Get Weather), we will also receive any additional important information necessary for the task (in this case, the location of the weather forecast).

In addition to providing developers with the capability of intent extraction with slot tagging, NLP Architect includes a pretrained model that supports the following intents for your testing and evaluation of the framework:

• Get weather
• Play music
• Book restaurant
• Search item
• Add to playlist
• Rate book
• Search movie schedule

## Prerequisites for Installing NLP Architect

The following is a list of prerequisites necessary to install NLP Architect:

1. Linux machine: I am running Ubuntu* 18.04 LTS. My recommendations are:
• Minimum of 60 GB disk
• Minimum of 8 GB of RAM
2. Git*
3. Anaconda* 7 (which includes Python 7 and Conda*)

The best way to get started with NLP Architect is to create a fresh Linux install running in a virtual machine (VM). This way you can ensure that the tools and dependencies needed for NLP Architect will not conflict with any critical software that you need on an existing development machine. As you can see from the prerequisites above, you need to have Git and Anaconda installed on your machine before continuing. If you are not familiar with installing those tools, there are several helpful guides online that can assist you.

## Setting Up Your Environment

The first step to set up your environment for NLP Architect is to update PIP*, the package manager for Python:

pip install --upgrade pip

Now, let’s verify the version of Python installed on your machine. After installing the latest version of the Anaconda platform, Python version 3.7 is installed as well:

python --version
Python 3.7.3 

Since NLP Architect requires Python 3.6, we will create a virtual environment that will be used by NLP Architect and have the previous version of Python.

conda create -n intelnlp python=3.6

After the virtual environment is created, the next step is to activate it with the following command:

conda activate intelnlp

Now this time, when we check the Python version, we see that it is the one we need for NLP Architect (which is v 3.6.x):

python --version
Python 3.6.8 :: Anaconda, Inc 

## Installing NLP Architect

Ok, so now that we have all the prerequisites out of the way, it is time to install the most up-to-date version of NLP Architect directly from GitHub*:

git clone https://github.com/NervanaSystems/nlp-architect.git

Note There are multiple ways to set up and configure your environment to use NLP Architect depending on your preferences for CPU, GPU, or Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) backends. In this article, we are going to cover the steps needed for a CPU backend, and to get some examples running. For the complete list of installation instructions for NLP Architect, and to understand all the options available, please be sure to consult the NLP Architect Installation Guide.

Now that we have cloned the NLP Architect repository from GitHub, all we need to do is enter the nlp-architect folder on the terminal, set the environment variable for the backend, and execute the first step of the two-stage process to install the library in developer mode:

cd nlp-architect
export NLP_ARCHITECT_BE=CPU
pip install -e . 

The installation script is going to install (or verify that you already have) a long list of Python packages including TensorFlow, DyNet*, spaCy*, Apache Falcon*, Keras*, and many others. Now, after the first step of the installation process is complete, all you need to do is install the remaining packages necessary for developer mode:

pip install -r dev-requirements.txt

And that’s it, you have just completed the installation process for NLP Architect. Let’s execute the following command on the terminal to verify the current version:

nlp_architect -v

At the time of this writing (Q3 2019), the response was:

nlp_architect v0.4.post2

## Updating NLP Architect

It is always a good idea to ensure that you are running the latest version of NLP Architect, and the process to update your installation could not be easier. All you need to do is execute this line from within your installation folder:

git pull origin master

If any changes exist, you will see all the details in the terminal window. After the command has finished executing, there is nothing more that you need to do (reinstalling NLP Architect is not necessary after updating).

So, what have we accomplished so far? At this point we have performed all the steps necessary to configure our environment, install NLP Architect, and verify the version of the library. However, what you have done so far is the equivalent of buying a Nintendo Switch* videogame console for your children for Christmas without buying any games. In order for NLP Architect to work, you need to give it pretrained data models. The process to accomplish this is covered in the next section.

## Running the NLP Architect Embedded Server

One of the greatest benefits of NLP Architect is that, although model training and development is done in Python, you can actually use other programming languages such as Java, C#, and Swift* to interact with NLP Architect, because the library includes an embedded HTTP server to allow you to invoke the NLP Architect APIs (and your models) with simple Representational State Transfer (REST) calls. So, in order to start up the server, just execute the following command:

nlp_architect server -p 9000

This will execute the embedded web server on port 9000 (and obviously, you can modify that command to use any port that you prefer). Now, by default, your installation of NLP Architect does not include any pretrained data models. Therefore, the first time that you start the server for NLP Architect, you will be prompted to accept and download some pretrained models (this, of course will make it easier to try out some of the samples directly from a web browser).

Now, if you have started the server before, then (depending on your hardware) the startup time of the embedded server may take a few minutes. However, if this is your very first time starting the server after a fresh installation, it will take considerably longer because you need to download the pretrained data models, which are over 1 GB of data uncompressed. NLP Architect will store these models in a cache folder and not within the NLP Architect folder itself. This way you can update NLP Architect without worrying about destroying your folders with your models. After the server has started, you should see a message in the terminal, as shown bellow.

Figure 3. When you see the message Serving on: 9000, then you know that the NLP Architect server is working properly.

Note If you restart your Linux machine (or exit the current terminal window), you will need to re-activate your virtual environment before starting the NLP Architect server again. However, the process is simple by executing the following commands:

cd [location of nlparchitect folder]
conda activate intelnlp
nlp_architect server -p 9000

## Executing the Intent Extraction Demo in the Browser

Now that the embedded server for NLP Architect is started and working properly on your edge platform, the next step is to verify that the data models are loaded properly. Open a web browser on your Linux machine and go to http://localhost:9000 to view the NLP Architect home page. Figures 4 and 5, respectively, show the NLP Architect home page and the intent extraction Demo page.

Figure 4. This is the home page for the NLP Architect server. Click the intent extraction link to navigate to the intent extraction Demo page.

Figure 5. The goal for intent extraction is to take unstructured, raw text and determine the intention of what the user wants to say or do.

## Creating Your Own HTTP Client for NLP Architect

Of course, accessing NLP Architect through a web browser is great for testing inference of your models, but it not the best interface for production use by a conversational agent. However, the process to create your own client to programmatically access the functionality of NLP Architect is actually quite simple. All you need to do is send a properly formatted HTTP POST request to the /inference URI of your NLP Architect server. Make sure that the request headers Response-Format and Content-Type are both set to application/json in your HTTP POST request. Finally, just create a simple JSON string that indicates the model that you want (in our case intent_extraction) and the phrase that you want to be processed by NLP Architect. The following code shows the HTTP POST headers and body necessary to process the phrase, I am traveling to London next week. What is the forecast there?

Response-Format: application/json
Content-Type: application/json
{"model_name":"intent_extraction","docs":[{"id":1,"doc":"I'm traveling to London next week. What's the forecast there?"}]}

The HTTP POST headers and body necessary to invoke the NLP Architect server programmatically.

After you have properly invoked the NLP Architect server, you should get back a simple JSON response with your intent extracted and your slots identified. The code bellow shows the JSON response delivered from the server as a result of the HTTP POST request from the code above. As you can see, GetWeather is properly identified as the intent, London is the city slot, and next week is the timeRange slot.

 [
{
"doc":{
"annotation_set":[
"city",
"timeRange"
],
"doc_text":"I 'm traveling to London next week . What 's the forecast there ?",
"spans":[
{
"start":18,
"end":24,
"type":"city"
},
{
"start":25,
"end":34,
"type":"timeRange"
}
],
"title":"GetWeather"
},
"type":"high_level",
"id":1
}
]



The HTTP response body from the NLP Architect server showing the intent of the user and the slots.

## NLPArchitectClient.java, the Java* Client for NLP Architect

The following code shows the complete source code for NLPArchitectClient.java. It uses the Apache* HTTPComponents library, and it is a practical example of how NLP Architect can be called from other programming languages via its embedded server.

import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import java.io.IOException;

public class NLPArchitectClient {

static String nlpArchitectUrl = "http://192.168.1.60:9000/inference"; // change to the IP address of your NLP Architect server
static long startTime, stopTime = 0;

/*
PROGRAMMER NOTE: This is the what the JSON body looks like in "pretty" JSON format

{
"model_name": "intent_extraction",
"docs": [
{
"id": 1,
"doc": "What’s the weather in San Diego, California?"
}
]
}

*/

//
// ...And this is what the JSON body needs to look like with the quotes (") escaped
//
static String jsonBody1 = "{\r\n" +
"    \"model_name\": \"intent_extraction\",\r\n" +
"    \"docs\": [\r\n" +
"        {\r\n" +
"            \"id\": 1,\r\n" +
"            \"doc\": \"What’s the weather like today in London?\"\r\n" +
"        }\r\n" +
"    ]\r\n" +
"}";

static String jsonBody2 = "{\r\n" +
"    \"model_name\": \"intent_extraction\",\r\n" +
"    \"docs\": [\r\n" +
"        {\r\n" +
"            \"id\": 1,\r\n" +
"            \"doc\": \"Can you give me the forecast for Chicago, Illinois?\"\r\n" +
"        }\r\n" +
"    ]\r\n" +
"}";

public static void main(String[] args) throws IOException {

CloseableHttpClient httpclient = HttpClients.custom().build();

StringEntity requestEntity1 = new StringEntity(jsonBody1, ContentType.APPLICATION_JSON);
StringEntity requestEntity2 = new StringEntity(jsonBody2, ContentType.APPLICATION_JSON);

try {

HttpPost postRequest1 = new HttpPost(nlpArchitectUrl);
postRequest1.setEntity(requestEntity1);

HttpPost postRequest2 = new HttpPost(nlpArchitectUrl);
postRequest2.setEntity(requestEntity2);

System.out.println("Invoking the Intel NLP Architect Server: \n" + postRequest1.getRequestLine() + "\n");

// Create a custom response handler
ResponseHandler<String> responseHandler = response -> {
int status = response.getStatusLine().getStatusCode();
if (status >= 200 && status < 300) {
HttpEntity entity = response.getEntity();
return entity != null ? EntityUtils.toString(entity) : null;
} else {
throw new ClientProtocolException("Unexpected response status: " + status);
}
};
startTime = System.currentTimeMillis();
String responseBody = httpclient.execute(postRequest1, responseHandler);
stopTime = System.currentTimeMillis();

System.out.println(responseBody);

long processingTime =  (stopTime - startTime);
System.out.println("Processing Time for NLP Statement #1: " + processingTime + " ms \n");

startTime = System.currentTimeMillis();
responseBody = httpclient.execute(postRequest2, responseHandler);
stopTime = System.currentTimeMillis();

System.out.println(responseBody);

processingTime =  (stopTime - startTime);
System.out.println("Processing Time NLP Statement #2: " + processingTime + " ms \n");
} finally {
httpclient.close();
}
}
}


NLPArchitectClient.java is a Java 9 client application for the NLP Architect server.

As you examine the class in the code above, you can easily see that all the work necessary to invoke the NLP Architect server is done in the main() method. Here, we create a CloseableHttpClient object and two HTTPPost objects. Before the main() method is declared, we have two Strings in JSON format that each contain a different phrase that we want NLP Architect to determine the intents and slots. We can see the results of running NLPArchitectClient.java against our NLP Architect server.

Invoking the Intel NLP Architect Server:
POST http://192.168.1.149:9000/inference HTTP/1.1

[{"doc": {"annotation_set": ["city", "timeRange"], "doc_text": "What ’s the weather like today in London ?", "spans": [{"start": 25, "end": 30, "type": "timeRange"}, {"start": 34, "end": 40, "type": "city"}], "title": "GetWeather"}, "type": "high_level", "id": 1}]
Processing Time for NLP Statement #1: 133 ms

[{"doc": {"annotation_set": ["state", "city"], "doc_text": "Can you give me the forecast for Chicago , Illinois ?", "spans": [{"start": 33, "end": 40, "type": "city"}, {"start": 43, "end": 51, "type": "state"}], "title": "GetWeather"}, "type": "high_level", "id": 1}]
Processing Time NLP Statement #2: 85 ms


NLPArchitectClient.java can easily send its request and receive a response from the NLP Architect server within a fraction of a second. The second HTTP request is significantly faster (in this case, only 85 ms) than the first request because the HTTP connection is not closed between invocations.

For obvious reasons, the performance times of 133 ms and 85 ms shown in this code are very good for invoking an edge server over a network with a string in Natural Language, and getting back a response with the intent of the user. This enables developers to have enough time to take the intent and slots and formulate a response to the user of their conversational agent, which can exist either as a chatbot or a virtual assistant.

## Conclusion

NLP Architect by Intel AI Lab is an open source Python library that allows developers to leverage pretrained models to solve various types of problems that could be solved with natural language processing. This article explained the reasoning on why and how AI could be applied at the Edge, and not within a data center or the cloud. We covered the process to install NLP Architect on a Linux machine and download some sample data models necessary to test the library. Finally, we explained the architecture of the NLP Architect server, and presented a Java 9 app that sends text in Natural Language and returns the intent and slots in a JSON response.

## About the Author

Bruce Hopkins is a member of the Intel® Software Innovators Program specializing in AI, NLP, Speech Recognition, and IoT. He has been an Oracle* Java Champion since 2010, and is the author of the book, “Bluetooth* for Java” by Apress* Publishers.

He is actively researching methods and technologies combining IoT in the Retail, Industrial, and Consumer spaces with AI and Voice Interfaces.

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.