Running AI workloads in a production environment presents a unique set of challenges, from managing complex infrastructure to ensuring scalability and reliability. The Linux Foundation’s Open Platform for Enterprise AI (OPEA) project aims to address these challenges by providing a flexible, open source framework that simplifies the deployment and management of AI applications. In this blog post, we’ll explore the benefits of OPEA and demonstrate a proof of concept with the ChatQnA app, a powerful AI-driven question-and-answer tool, running on Red Hat OpenShift. This demo showcases OPEA’s potential to transform AI operations in a production setting, and future iterations will make this process easier and more user-friendly.
OPEA is designed to democratize AI by making it accessible across a wide range of hardware solutions. By supporting multiple hardware architectures, including CPUs, GPUs, and specialized AI accelerators, OPEA provides organizations with the flexibility to choose the best hardware and platform for their specific needs. This choice is crucial as it allows businesses to optimize performance, cost, and energy efficiency based on their unique requirements. Additionally, its open source nature ensures that it can be integrated with various existing systems and tools, fostering innovation and reducing vendor lock-in. By offering these choices, OPEA empowers enterprises to tailor their AI infrastructure to achieve optimal results, making AI more practical and scalable in real-world applications.
ChatQnA Proof of Concept
OPEA offers a variety of reference AI workloads, and in this blog, we are showcasing a generative AI (GenAI) ChatQnA using the Llama 3.1-8B model from Hugging Face, accelerated across four Intel® Gaudi® AI accelerators. This demo is not limited to this specific large language model (LLM); many other models can be used, and different vector databases can be integrated as well. OPEA’s flexibility allows for many components to be substituted with alternatives, highlighting its open source nature and broad appeal. Additionally, the solution is not confined to running on Intel Gaudi accelerators; it could also be deployed on Intel® Xeon® CPUs with Intel® Advanced Matrix Extensions (Intel® AMX) acceleration providing a versatile and adaptable AI deployment framework.
On the OPEA site, you can find a reference ChatQnA microservice and a megaservice, which is a collection of microservices, that can be deployed easily with just a few Docker commands. However, for running and scaling in a cloud native environment, leveraging platforms like Red Hat OpenShift and Red Hat OpenShift AI is a production-supported way to ensure robust, scalable, and efficient deployment and management of applications.
While this article won't delve into the technical setup details for building a chatbot on OpenShift, we provide a comprehensive list of all the components one would use so you can see how they integrate in this proof of concept (POC). For those specifically interested in setting up an Intel® Gaudi® AI accelerator within OpenShift, the Intel® Technology Enabling for OpenShift GitHub repository offers detailed instructions.
Red Hat OpenShift Components
The following section lists all the operators that help automate the installation of drivers and OpenShift components necessary for this POC. It also includes the pod deployments of all the OPEA microservices that make the chatbot work.
Operators
- Intel Gaudi AI Accelerator: Discovers the number of Intel Gaudi AI accelerators in each cluster and installs a compatible driver
- Kernel Module Management: Simplifies the management of kernel modules in OpenShift and is necessary for the Intel Gaudi AI Accelerator to install drivers on each Intel Gaudi node
- Node Feature Discovery: Labels nodes with Intel Gaudi AI accelerators and other detailed hardware features of each node
- Red Hat OpenShift AI: Installs OpenShift AI, providing a scalable and secure environment for developing, training, and deploying AI models
- Red Hat OpenShift Serverless: Provides a framework for deploying, managing, and scaling AI workloads in OpenShift AI
- Red Hat OpenShift Service Mesh: Offers observability, traffic management, and scalability for AI models deployed in OpenShift AI
Deployments
- redis-vector-db: Part of the ChatQnA megaservice, this deployment provides the vector encodings essential for the retrieval augmented generation (RAG) architecture
- chatqna-rag-redis: Also part of the ChatQnA megaservice, this deployment contains Python scripts that embed user queries, search the vector database, re-rank retrieved data based on saliency, interact with the LLM, and return responses based on input data and user queries. It can also process PDF files for RAG, encoding the data into the vector database, as demonstrated with the customized Intel Gaudi 3 responses in the demo.
- chatqna-nonrag-redis: This deployment directly interacts with the LLM to handle user queries, demonstrating a non-RAG answer. It is not part of the ChatQnA megaservice and serves to highlight the benefits of using RAG.
- ui-demo: A custom-created web GUI that sends user queries to both the chatqna-nonrag-redis and chatqna-rag-redis backends, showcasing how an LLM interacts with and without RAG in a user-friendly interface. The OPEA project v1.0 also has its own reference UI that could be used.
- minio: This deployment creates an S3 storage mount where the Llama 3.1-8B model is stored. S3 storage is necessary when deploying a model with OpenShift AI, and any S3-compatible storage can be used.
Red Hat OpenShift AI Components
Below are all the Red Hat OpenShift AI components that were configured to make the chatbot work.
Accelerator Profiles
- gaudi: This accelerator profile appears in the Accelerator drop-down menu when deploying a model, instructing OpenShift AI to request an Intel Gaudi AI accelerator. This is described in more detail in the Habana Gaudi Integration guide within the Red Hat documentation.
Serving Runtimes
- tgi-gaudi-llama3: This serving runtime can be selected when deploying a model. It specifies how the model is internally mounted, which runtime container image to use, and sets other AI parameters for deployment.
Data Science Projects
- Data Connections: A data connection named minio-llama3 was used to connect to the Llama 3.1-8B model stored in the MinIO storage bucket.
- Models: A model named gaudi-llama3 was deployed using four Intel Gaudi AI accelerators and the tgi-gaudi-llama3 serving runtime described above.
All these components work together to provide users a complete reference solution to generate customized responses that can be scaled in a cloud native environment. The potential use cases are endless. A cosmetics company, for example, could use a RAG solution to provide personalized skincare recommendations. A financial institution might offer targeted investment advice based on client data. Check out this short video showing how everything works together.
Harnessing the Full Potential of AI with OPEA ChatQnA on OpenShift
Deploying the OPEA ChatQnA reference AI solution on OpenShift showcases the powerful synergy between advanced AI frameworks and robust container orchestration. By leveraging hardware accelerators such as Intel Xeon processors, and Intel Gaudi AI processors, organizations can significantly enhance the performance and efficiency of their AI workloads. Each component of the solution is highly customizable, providing the flexibility to tailor deployments to specific needs while avoiding software and hardware vendor lock-in.
Although efforts are underway to further simplify the deployment process on OpenShift, it’s already possible to achieve impressive results with the current setup. This flexibility and forward-thinking approach ensure that enterprises can harness the full potential of AI in a scalable, secure, and efficient manner.
Get Started with OPEA and OpenShift Today
We encourage developers and AI enthusiasts to check out the OPEA project and explore its capabilities. By experimenting with the ChatQnA app and other reference AI workloads, you can gain hands-on experience and see firsthand how OPEA can transform your AI operations.
To get started, visit the OPEA GitHub repository for detailed documentation and setup guides. For further learning, check out the Red Hat OpenShift AI documentation and join one of the OPEA community events to connect with other developers and share your experiences. Together, we can push the boundaries of AI and create innovative solutions that drive business success.
About the Author
Eric Adams, Cloud Software Engineer, Intel
Eric Adams is a cloud software engineer at Intel, with a tenure of 23 years at the company. He holds a degree in Electrical and Computer Engineering from Tennessee Technological University. Eric has most recently been focused on Red Hat OpenShift, where he assists customers in enabling Intel accelerators and security technologies.