Business Results

  • Accelerated training enables more frequent training to achieve better accuracy and faster inference that enables more frequent predictions in real-time scenarios. As a result, the entire deep learning pipeline speeds up.

author-image

By

Background

Challenges associated with data privacy, limited data availability, data labeling, ineffective data governance, high cost, and the need for a high volume of data are driving the use of synthetic data to fulfill the high demand for AI solutions across industries.

Synthetic voice has wide applications in virtual assistants, education, healthcare, multimedia, and entertainment. Text-to-Speech (TTS) is one method of generating synthetic voice. It creates human speech artificially. One of the main benefits of voice synthesis is to make information easily accessible to a wider audience. For example, people with visual impairments or reading disabilities can use this technology to read aloud the written content. This can help people with a disability access a wider range of information and communicate more easily with others.

Solution

In collaboration with Accenture*, Intel developed this Voice Data Generation AI reference kit. Paired with Intel® software, this kit may help customers in developing synthetically generated voice data. The reference kit implementation provides a performance-optimized guide around synthetic voice generation use cases that can easily be scaled across similar use cases in various domains.

 

End-to-End Flow Using Intel® AI Software Products

Text data is translated into speech. A transfer learning approach is performed on advanced PyTorch*-based pretrained Tacotron and WaveRNN (vocoder) models. This model combination is known to be a promising method to synthesize voice data from the corresponding input text data. The LJ Speech dataset, after preprocessing using NumPy, is used for further training the pretrained models. From the input text data, the model generates speech that mimics the voice of the LJ Speech dataset used to train the AI model.

By quantizing and compressing the model (from a floating point to an integer model), while maintaining a similar level of accuracy as the floating point model, efficient use of underlying resources is demonstrated when deployed on edge devices with lower processing and memory capabilities.

This reference kit includes:

  • Training data
  • An open source, trained model
  • Libraries
  • User guides
  • Intel® AI software products

At a Glance

  • Industry: Cross-industry
  • Task: Model training, voice data generation by translating the text sentence into speech, and model quantization
  • Dataset: 13,100 audio files from the LJ Speech dataset used for AI model training. Input data for inferencing is done two ways: typing a single text sentence and using a .csv file containing multiple text sentences​.
  • Type of Learning: Transfer learning
  • Models: PyTorch-based pretrained Tacotron and WaveRNN (vocoder) models
  • Output: Audio files
  • Intel AI Software Products:
    • Intel® Extension for PyTorch*
    • Intel® Neural Compressor

Technology

Optimized with Intel AI Software Products for Better Performance

The image data generation model was optimized by Intel® Optimization for TensorFlow*. Intel Neural Compressor was used to quantize the FP32 model to the int8 model.

Intel Extension for PyTorch and Intel Neural Compressor allow you to reuse your model development code with minimal code changes for training and inferencing.

Performance benchmark tests were run on Microsoft Azure* Standard_D8_v5 using 3rd generation Intel® Xeon® processors to optimize the solution.

Benefits

Synthetic voice has wide applications in virtual assistants, education, healthcare, multimedia, and entertainment. TTS is one method of generating a synthetic voice.

This reference kit provides a performance-optimized guide around synthetic voice generation use cases that can easily be scaled across similar use cases in various domains.

Machine learning developers need to train models for a substantial number of datasets. The ability to accelerate training may allow them to train more frequently and achieve better accuracy. Faster inferencing speed may enable them to run prediction in real time as well as perform offline batch processing.

With Intel® oneAPI tools, little to no code change is required to attain the performance boost.

Download Kit

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

By submitting this form, you are confirming you are an adult 18 years or older and you agree to share your personal information with Intel to use for this business request. Intel's web sites and communications are subject to our Privacy Notice and Terms of Use.
By submitting this form, you are confirming you are an adult 18 years or older and you agree to share your personal information with Intel to use for this business request. You also agree to subscribe to stay connected to the latest Intel technologies and industry trends by email and telephone. You may unsubscribe at any time. Intel's web sites and communications are subject to our Privacy Notice and Terms of Use.