A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers
In this article, we discuss:
- Formalizing an idea and shaping it into a real project by applying various project analysis techniques and using relevant project management tools
- Using CRISP-DM methodology (Cross Industry Standard Process for Data Mining)
- Outlining the standard tasks for any AI project
Some of the activities are team-focused. If you are working solo, you can skip these sections or complete the corresponding tasks using the “multiple hats” framework.
Our sample project for this series is an app that extracts emotions from uploaded images using an image processing (emotion recognition) algorithm, generates music that represents the extracted emotions, and then creates a movie combining the images and music.
Shaping your idea into a project begins with defining the scope of the project—the what and how of building your app—to eliminate potential ambiguity during development. For example:
- How many images do we allow to be uploaded?
- What does it mean to extract emotions?
- How do we represent an emotion in a computer?
- What kind of emotions are supported?
- How do we assign an emotion to a set of images?
- How do we train a music generation model?
- How do we connect the emotion recognition component to the music generation component?
- How do we seed the music generation process?
- How do we guarantee that a computer-generated melody is consistent with the emotions?
- How do we produce the final movie and ensure that the transitions between frames are smooth?
You need to address each of these questions before you can move to the implementation phase. More importantly, you need a methodology that you can use to help you to come up with your own questions about your app without missing anything important.
Project Analysis Methods
Consider these three simple, yet effective, techniques for analyzing a project and generating tasks.
- Hierarchical decomposition. Based on the divide-and-conquer principle, this technique involves the iterative decomposition of tasks into subtasks until an individual subtask takes no longer than two hours¹ or the time required to do the decomposition is comparable with the task completion.
At the top of the hierarchy is the block denoting the entire system. The next level contains the blocks associated with the logically self-contained system components that cumulatively deliver the desired user experience, and so on, recursively. Since you start from the entire system and decompose it level-by-level into a set of mutually exclusive components, this ensures no task is overlooked.
- What-if analysis. With this method, you repeatedly ask, “What if?” to find subtle nuances.
- User journey simulation. Used by many user experience designers and product managers, this technique generates a set of tasks and scenarios that your app must support. Imagine yourself using the app and simulate typical sessions that your users could generate. If you can mentally move from start to finish without a flop, you don’t need to add any more tasks—this is your stopping criteria. However, initially you’ll likely encounter problems at every step of the user journey. When this happens, just note it and create a task to address the problem.
Analyze the Movie-Making App
Using the hierarchical decomposition method for the sample project, we determine that, like any user-facing app, the movie-making app can be broken into:
- The front end—a straightforward user interface component
- The back end—where the interesting AI stuff happens, which in turn can be divided into the:
- Emotion recognition component
- Music generation component
Within the front and back end, there are three key components: the user interface (blue), the emotion recognition (orange), and the music generation (green), as shown in Figure 1.
Figure 1. System diagram for the movie-making app.
Front-End User Interface
The user interacts with this component using buttons to upload images, begin the movie-generation process, and share or download the result. While the user interface is an important component, our focus for this tutorial is on the intelligent information processing on the back end using new Intel® technologies.
Step 1: Extract emotions
Step 2: Generate movie
Step 3: Share
Back-End Emotion Recognition (Image Processing)
This component is responsible for assigning emotions to images. As for any intelligent component that learns from data, there are two processes:
- How the component trains or learns—the training stage
- How to apply the component—the application and testing stage
Each of these processes is associated with a set of standard AI tasks.
During the training stage, you provide the data with the answers and train the machine learn the “rule” to assign correct answers to inputs coming from the same population or distribution. Tasks for the training stage include:
- Define inputs and outputs for each data processing step.
- Find or create a data set for machine learning that matches the defined input and output interfaces.
- Train a machine learning model.
- Evaluate the machine learning model.
During the application and testing stage, the learned or trained model is used to predict the answers for previously unseen objects from the same population or distribution. The tasks include:
- Define inputs and outputs for each data-processing step.
- Deploy a machine learning model.
More broadly, the tasks and steps typical for almost any AI project can be framed within the CRISP-DM methodology. This methodology represents the data mining or AI project as a cycle with six states as depicted in Figure 2. Within each step, there are tasks and subtasks, such as data labeling, model evaluation, and feature engineering. The cycle comes from the fact that any real intelligent system can be improved.
Figure 2. The relationship between the different phases of CRISP-DM.
Moving from theory back to practice, let's formalize inputs and outputs for training and test an emotion recognition model.
We have six different emotions: anxiety, sadness, happiness, tranquility, determination, and awe.
- Each emotion is represented in a computer as a code from 1 to 6 following a content coding approach
- Each image is represented as a 3D-tensor (3D-array) where the three dimensions are height, width, and color channel (for example, RGB)
- Input: 4D tensor (a set of labeled 3D images, that is, an image and an emotion class)
- Output: A trained machine learning model
- Input: 3D tensor (an unlabeled image), a train model
- Output: A multinomial probability distribution over codes/classes
Next, consider how this formalization fits into a bigger picture and apply system analysis techniques to resolve inconsistencies:
- A user will upload several images. Therefore, we must decide how to combine emotions predicted for each individual image into an emotion for the entire movie.
- The images will be accompanied by music. Therefore, we need to figure out how a given emotion can generate a melody.
- The content of the images must be in sync with the music, and transitions between the images must be smooth for the user to feel comfortable.
To begin, we will focus on music generation for one image. We’ll return to multiple images after we cover the music-generation component and figure out a way to integrate it with the emotion-recognition component.
Back-End Music Generation
This component produces a song when given an emotion. To connect the music generation component with the image processing component, we need a link between an emotion code and an audio signal using the following steps:
- Fetch a random famous song from a prebuilt database of famous songs.
- Adjust the melody tempo, scale, rhythm to fit the emotion using a simple script.
- Seed the music generation process based on machine learning with the “emotion-modulated” base song.
The music generation process essentially completes the emotion-modulated base seed song by guessing the most naturally sounding next musical note being trained on a large corpus of songs.
Similarly to the emotion recognition component, we have to define inputs and outputs for training and testing the music generation model.
- Input: A collection of songs
- Output: A trained machine learning model for music generation (for example, given a note, predict the next note)
- Input: A trained machine learning model for music generation AND a trained a base song (a series of notes)
- Output: A sequence of notes completing the base song
Given the fixed APIs for the music generation component, we must implement the emotion-modulation script and complete all the AI tasks related to music generation, such as finding a data set for training a music generation model or finding base songs. These details will be covered later in this series. For our sample project we will use the following:
- Base songs come from the publicly available media collection (songs before 1922)
- The music generation process is trained on Bach’s Chorales—music sheets symbolically representing Bach’s works, which were prepared as part of the BachBot project
- The raw files come from the music21 project
Connect the Components
The design application for image or music combinations are shown below with the advantages and disadvantages of each.
Variant A: One base song modulated for the dominant emotion for all images.
- Pros: Smooth transitions between images
- Cons: A user might feel uncomfortable if the music stays the same when a happy image is replaced by a sad image
Variant B: One base song modulated for each image emotion independently.
- Pros: Smooth transitions between images and clear matching between an emotion depicted on an image and music
- Cons: None
Variant C: Different base songs modulated for each image emotion independently.
- Pros: Clear matching between an emotion depicted on an image and music
- Cons: Choppy transitions between images since the music might change abruptly upon the emotion switch
For example, if we have three images with emotions (happiness, tranquility, awe), use “Jingle Bells” as the base song, and BachBot for music generation, the result will be three generated songs based on three different versions of “Jingle Bells” (“Jingle Bells”-happiness, “Jingle Bells”-tranquility, “Jingle Bells”-awe). Each input image is processed independently by calling the emotion recognition API, using one base song spanning all images, modulating the base song for each emotion, and using each of those emotion-modulated versions for each image, respectively.
For additional details see:
General Project Management Guidelines
To perform the project decomposition, follow these steps:
- Define a high-level structure (up to three levels of the decomposition).
- Talk to each individual contributor (usually done by the system architect) to:
- Learn about the capabilities and requirements of the corresponding components.
- Formally define the APIs.
- Refine the project decomposition hierarchy. In this case, subtle integration details will be identified early in the project, making less work and rework during the integration phase.
The following are general tips that will help you formalize your project.
- Make sure that one person (typically the system architect) conducts the decomposition and that all the team members actively participate by providing input and critiquing the specifications. This helps to maintain conceptual integrity, a crucial property of a well-engineered system².
- You may need to simultaneously consider several components to find the right abstraction (inputs and outputs) for each component. In this case, list all possible input and output formats for each system, and then find the configuration that is feasible for both components.
- After you define the AI task, since these tasks repeat from project to project, save time by outlining the set of standard AI tasks as a subtree in the decomposition hierarchy.
- Start with a simple case (for example, one image, one emotion, or one song) and expand from there rather than trying to come up with the component integration model for the final app from scratch.
Subsequent articles will cover all the tasks for our sample project in detail.
Below is a list of typical tasks for AI projects. Use it as a template for your project.
- Formulate a business problem by defining inputs and outputs, such as the objects and labels or target variables.
- Understand the data.
- Sample data
- Perform exploratory data analysis
- Clean the data.
- Remove duplicates, outliers, and so on
- Normalize feature values
- Establish the evaluation methodology testbed.
- Develop a machine learning model.
- Prepare a data set for machine learning.
- Collect raw data
- Search for a relevant data set
- Set up storage infrastructure
- If the labels are unavailable, annotate the raw data.
- Design annotations guidelines
- Run the annotation process
- Check annotation quality
- Train a machine learning model.
- Select the framework
- Do a comparative analysis of existing frameworks
- Install and configure the most appropriate framework
- Select and set up the infrastructure for machine learning
- Do a comparative analysis of private and public clouds and execution technologies
- Do the capacity planning to satisfy your machine learning objectives
- Select an algorithm
- Prototype an algorithm
- Improve your model by tuning hyper-parameters and adding domain-specific insights
- Select the framework
- Evaluate the model.
- Deploy the model.
- Define the service-level agreements (SLAs) for a machine learning API
- Wrap a machine learning model into an API (a container or re-implementation in a better performing programming language)
- Load test the API to see whether it meets the required SLAs
In this article, we discussed three popular system analysis techniques that were applied to the movie-making project. By using the hierarchical decomposition process, we identified the app's three key components: the user interface, emotion recognition, and music generation. We provided detailed analysis of emotion-recognition and music-generation AI components, including definition of the inputs and outputs, the training and testing processes, and how to integrate the components. Finally, we shared project-planning tips along with a set of standard AI tasks relevant for any AI project based on the CRISP-DM methodology.
2. Frederick Brooks, The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley 1975)
|Prev: The Anatomy of an AI Team||Next: Select a Deep Learning Framework|
Create Applications with Powerful AI Capabilities
The Anatomy of an AI Team
Select a Deep Learning Framework
Select an AI Computing Infrastructure
Augment AI with Human Intelligence Using Amazon Mechanical Turk*
Crowdsourcing Word Selection for Image Search
Data Annotation Techniques
Set Up a Portable Experimental Environment for Deep Learning with Docker*
Image Dataset Search
Image Data Collection
Image Data Exploration
Image Data Preprocessing and Augmentation
Overview of Convolutional Neural Networks for Image Classification
Modern Deep Neural Network Architectures for Image Classification
Emotion Recognition from an Images Baseline Model
Emotion Recognition from Images Model Tuning and Hyperparameters
Music Dataset Search
Music Data Collection and Exploration
Emotion-Based Music Transformation
Deep Learning for Music Generation: Choosing a Model and Preprocessing
Deep Learning for Music Generation: Implementing the Model
TensorFlow Serving for AI API and Web App Deployment
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.