Build Your First Chatbot with OPEA In Minutes

author-image

By

Want to unlock the potential of generative AI (GenAI) for your enterprise, enhance productivity, and sharpen your competitive edge? In this post, we’ll show you how to build your own chatbot in less than an hour using the components and frameworks that are part of the Open Platform for Enterprise AI (OPEA).

Why OPEA? GenAI applications leverage models trained on massive datasets with strong generalization capabilities. However, when addressing specific problems, your application may produce inaccurate responses due to the trained model lacking relevant knowledge. To obtain contextually accurate answers, you have to provide specific data to the system. For enterprises looking to customize GenAI applications, OPEA can provide valuable support.

In this blog, we’ll show you step-by-step how to build your first chatbot. And that is just the beginning. OPEA also provides samples for document summarization, data analysis to harvest new insights, and code generation solutions to accelerate software development.

What is OPEA?

OPEA is an open source project that allows you to create multi-provider, robust, and composable GenAI solutions, leveraging innovations across the ecosystem.

OPEA includes:

  • Framework: A framework of composable state-of-the-art building blocks for GenAI systems including LLMs, data stores, guardrails, and prompt engines.
  • Architectural blueprints: Retrieval-augmented GenAI component stack structure and end-to-end workflows.
  • Evaluation: A four-step assessment, which is in development, for grading GenAI systems around performance, features, trustworthiness, and enterprise-grade readiness.

OPEA is designed with the following considerations:

Efficient Infrastructure Utilization: OPEA supports a wide range of hardware, including Intel® Xeon®, Intel® Gaudi® AI Accelerator, Intel® Arc™ GPU, Nvidia GPU, and AMD ROCm enabling you to harness your existing infrastructure, including AI accelerators or other hardware of your choosing.

Seamless Enterprise Integration: Seamlessly integrates with enterprise software, including authentication and authorization systems and firewalls, providing heterogeneous support and stability across systems and networks.

Open Innovation and Flexibility: Brings together best-of-breed innovations and is free from vendor lock-in, ensuring flexibility and adaptability.

Ubiquitous Versatile Deployment: Runs everywhere through a flexible architecture designed for cloud, data center, edge, and on-premises environments.

Scalable Ecosystem and Growth: Access to a vibrant ecosystem of partners to help build and scale your solution.

Build Your Chatbot with OPEA

OPEA GenAIExamples are designed to give developers an easy entry into GenAI, featuring microservice-based samples that simplify the processes of deploying, testing, and scaling GenAI applications. All examples are fully compatible with Docker and Kubernetes, supporting a wide range of hardware platforms, ensuring flexibility and efficiency for your GenAI adoption.

ChatQnA is a retrieval-augmented generation (RAG) chatbot, which integrates the power of retrieval systems to fetch relevant, domain-specific knowledge with GenAI to produce human-like responses. The ChatQnA dataflow is shown in Figure 1. At the left side it ingests a file into the vector database. From step A to D, it reads and splits data into chunks, which are then encoded into vectors through an embedder and stored along with meta data into the vector database.

On the right side, the retrieval subsystem converts the user query (also known as a prompt) into a vector using the same embedder, which is then used to retrieve similar items from the vector database. These are then sent together to the large language model (LLM). Even if the LLM has no prior knowledge pertaining to the user’s original query, it can comprehend the supplied related material to answer the question.

RAG chatbots can address various use cases by providing highly accurate and context-aware interactions, which can be used in customer support, internal knowledge management, finance and accounting, and technical support.

Diagram showing the components in the workflow for OPEA’s ChatQnA example.

Figure 1. ChatQnA Dataflow

Deploying ChatQnA: A Step-by-Step Guide

In this section, we’ll walk you through the hardware and software requirements for running ChatQnA on AWS, followed by the environment variables you’ll need to set it up. Next, we’ll explain how to retrieve the necessary container images, followed by the commands to actually launch the ChatQnA application using either Docker or Kubernetes. Lastly, we show how to determine whether the application is running and how to use it.

Hardware Requirements

  • CPU: 4th Gen (and newer) Intel® Xeon® Processors with Intel® Advanced Matrix Extensions (Intel® AMX)
  • Memory: 64GB minimum
  • Storage: 100GB disk space, minimum

The recommended configurations are Amazon EC2 C7i.8xlarge and C7i.16xlarge instance types. These instances are Intel® Xeon® with AMX, to leverage 4th Generation Intel® Xeon® Scalable processors that are optimized for demanding workloads.

Software Requirements

  • OS: Ubuntu 22.04 LTS (you could use another OS)

Required Models

The default embedding, reranking, and LLM models are set in set_env.sh. You can choose models by changing the corresponding environment variable in set_env.sh mentioned in section 3.3.

Deploy Using Docker on an AWS EC2 Instance

In this section we show how to deploy ChatQnA on an AWS EC2 instance.
 

Step 1.  Set Up your Environment

Download the code and setup environment variables here:

git clone https://github.com/opea-project/GenAIExamples.git

cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon

git checkout v1.2  (Note:  this retrieves the 1.2 version of OPEA.)

export host_ip=”your_ipv4_address”

export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"

source set_env.sh
 

Step 2. Start the Docker Container

Run this command to start the Docker container:

docker compose up –d

Note:  docker compose up -d starts the multi-container application defined by the compose.yaml file, listed in Table 1. You can use other available YAML files or create your own to run different containers, for example using vLLM versus TGI for LLM inference.

This command automatically downloads the following Docker images from Docker Hub and starts the Docker containers.

Table 1. docker images defined in compose.yaml

Image name tag
opea/dataprep 1.2
opea/vllm 1.2
ghcr.io/huggingface/text-embeddings-inference cpu-1.5
opea/retriever 1.2
opea/chatqna 1.2
opea/chatqna-ui 1.2
opea/nginx 1.2 
redis/redis-stack 7.2.0-v9


Step 3. Check Docker Container Status

Run this command to check the Docker container status:

docker ps -a

Make sure all the Docker containers are in UP status as shown below:


Step 4. Check That the vLLM Service Is Ready

It takes a few minutes for the vLLM service to download LLM models and warm up before it’s ready for inference. Check the vLLM service log to ensure it is ready.

Run this command to check the log: 

docker logs vllm-service 2>&1 | grep complete

The following log indicates the vLLM service is ready.


Step 5. Access the ChatQnA Service

Open the following URL in your browser:

http://{Public-IPv4-address}:80

Make sure to access the AWS EC2 instance through the Public-IPv4-address.

In the following video demo, we ask ChatQnA “what was Nike’s revenue in 2023?” It does not give the correct answer because the LLM model has no current up-to-date knowledge of Nike’s revenue. After we upload the Nike 2023 ANNUAL REPORT document, it gives the right answer using the RAG feature.

Brief video demo that shows ChatQnA results before and after using the RAG feature, where the RAG feature enables a more accurate response.

Access ChatQnA video demo.

Deploy Using Kubernetes on an AWS EC2 Instance

You will first need to set up a Kubernetes cluster on your AWS EC2 instance followed by installing Helm. Please refer to k8s_install_kubespray  and Helm  to set up Kubernetes and Helm respectively.


Step 1. Setup Environment

Please refer to Step 1 in Deploy Using Docker on AWS EC2 Instance to set up the environment.


Step 2. Start Kubernetes Services

To start your Kubernetes services type in this command:

helm install chatqna oci://ghcr.io/opea-project/charts/chatqna  --set global.HUGGINGFACEHUB_API_TOKEN=${HF_TOKEN} -f cpu-values.yaml


Step 3. Check Kubernetes Status

First, check the services status to get the port number to access ChatQnA by entering this command:

kubectl get services

Screenshot of a Kubernetes service showing IP addresses and port numbers.

Figure 2. Kubernetes Service

Take note of the nginx node port, which is **31146**. We will access ChatQnA using this port in an upcoming step.

Next, check that the pods are in “Running” status using:

kubectl get pods

Make sure all pods are in the ready state.   Please check the container logs to troubleshoot any issues if any one of the pods isn’t in the running state. If you’re still stuck, file an issue providing details in the GenAIExamples repository.

Screenshot of a Kubernetes service pod status.

Figure 3. Kubernetes Pod Status


Step 4. Access the ChatQnA Service

Open the following URL in your browser:

http://{Public-IPv4-address}:31146

Note that the port number 31146, is the Kubernetes chatqna-nginx service port shown in Figure 2.

For ChatQnA example interaction, please refer to the Access ChatQnA Service video demo show above in Step 5.

Next Steps

We demonstrated the detailed steps to deploy ChatQnA on AWS cloud. You can deploy other OPEA GenAIExamples such as  VisualQnA, VideoQnA, Document Summarization, Documents Index and Retrieval, Code Generation, Translation, Text2Image, and Instruction-Tuning using similar steps. The computing instance size and/or cluster resources you need depends on the model size and other elements in your pipeline.

Please refer to the following links for more OPEA details.

Resources

About the Authors

Wang, Xigui, Software AI Frameworks Engineer.

Bhandaru, Malini, Dr. Malini Bhandaru, OPEA Technical Steering Committee Chair, Cloud Architect and Intel Senior Principal Engineer

Sin, Alex AI Software Solutions Engineer

Du, dolpher, Cloud Orchestration Software Engineer

Yao, Yi, AI Frameworks Engineer

Hu, Ying, Software AI Frameworks Engineer