Workshop: Build Multimodal Systems on AI PCs
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
Going from voice to text to voice, this session uses PyTorch* to build GPU acceleration into multimodal AI PC applications. Participants learn how to implement and optimize audio-processing pipelines, combining speech recognition and speech synthesis into coherent applications. Transformer-based models are explored, harnessing GPU acceleration and optimizing techniques on AI PCs. Gain real-world experience and skills by creating responsive applications and processing and generating audio effectively.
Using PyTorch to enhance GPU acceleration and discovering the capabilities of AI PCs, this session covers:
- Configuring and using GPU resources effectively for audio AI workloads
- Implementing speech-to-text conversion using transformer-based models on consumer hardware
- Building and optimizing text-to-speech systems for responsive audio generation
- Creating end-to-end audio processing pipelines combining multiple AI models
- Developing responsive multimodal applications that process and generate audio in near real time
The session is designed for developers and AI practitioners at the expert level. Attendees should meet the following prerequisites:
- Python* programming: Intermediate-level Python skills, including familiarity with functions, classes, and error handling
- PyTorch basics: Familiarity with PyTorch and basic neural network concepts
- Deep learning fundamentals: Understanding of basic concepts, such as training, inference, and model architecture
- Development environment: Experience with Jupyter* Notebook and package management (pip and conda*)