Build Multimodal Systems on AI PCs

Workshop: Build Multimodal Systems on AI PCs

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Overview

Going from voice to text to voice, this session uses PyTorch* to build GPU acceleration into multimodal AI PC applications. Participants learn how to implement and optimize audio-processing pipelines, combining speech recognition and speech synthesis into coherent applications. Transformer-based models are explored, harnessing GPU acceleration and optimizing techniques on AI PCs. Gain real-world experience and skills by creating responsive applications and processing and generating audio effectively.

Using PyTorch to enhance GPU acceleration and discovering the capabilities of AI PCs, this session covers:

Configuring and using GPU resources effectively for audio AI workloads
Implementing speech-to-text conversion using transformer-based models on consumer hardware
Building and optimizing text-to-speech systems for responsive audio generation
Creating end-to-end audio processing pipelines combining multiple AI models
Developing responsive multimodal applications that process and generate audio in near real time

The session is designed for developers and AI practitioners at the expert level. Attendees should meet the following prerequisites:

Python* programming: Intermediate-level Python skills, including familiarity with functions, classes, and error handling
PyTorch basics: Familiarity with PyTorch and basic neural network concepts
Deep learning fundamentals: Understanding of basic concepts, such as training, inference, and model architecture
Development environment: Experience with Jupyter* Notebook and package management (pip and conda*)

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Workshop: Build Multimodal Systems on AI PCs

Overview