Hands-On AI Part 24: TensorFlow* Serving for AI API and Web App Deployment

Published: 12/19/2017  

Last Updated: 12/19/2017

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

Welcome to the final article in the hands-on AI tutorial series. We have been building a movie-generation demo app with powerful AI capabilities, and now we are about to finish it. You can skim the first overview article in the series as a refresher. To recall a key components of the app, you can check an article about project planning.

This final article is devoted to deployment issues and provides step-by-step instructions on how to deploy:

  • Web API services
    • Emotion recognition (image processing)
    • Music generation
  • A web app server with a slick user interface (UI)

Each set of instructions can be performed independently. All the sources are available on GitHub* and Dropbox* (for emotion recognition and image processing) and Dropbox (pretrained models for music generation).

Once you have finished, you will have a fully functional app. For illustration purposes, the start page of the app is shown below.

App Overview

The app architecture consists of four parts (see the diagram below):

  • Web app server
  • Web client (UI)
    • Photo uploading view
    • Slideshow view, which will be sharable
  • Remote image processing service
  • Remote music generation service

The web app is based on the lightweight Flask* framework. The majority of the front-end functionality is implemented using JavaScript* (Dropzone.js* for drag-and-drop file upload), and the music (re)play is based on MIDI.js*.

The AI API for emotion recognition is served using a combination of Flask and TensorFlow* serving on Microsoft Azure*, and the AI API for computer music generation is also a containerized application on Microsoft Azure. We created two independent containers for the image and music parts following the Docker one container per process ideology.

Clone the project git repo as follows (for emotion recognition and music generation, some files are hosted on Dropbox due to the size of the corresponding archives, for example, pre-trained models; we provide the links to them inline in the corresponding sections):

git clone

We provide complete deployment instructions for the AI components of our app, because the tutorial series focuses on that topic. We briefly cover the web app deployment process, including showing how to start a web app locally (http://localhost:5000) but not in the cloud. The details of web app deployment are straightforward and are already covered elsewhere, including in these tutorials: Deploying a Flask Application to AWS Elastic Beanstalk and How to Deploy a Flask Application on an Ubuntu* VPS.

Remote Image Processing API

In the previous articles on image processing, you learned how to (pre)process an image dataset and how to define and train a CNN model using Keras* and TensorFlow*. In this article, you learn how to take a trained Keras model and deploy it in a Microsoft Azure cloud as a simple web service with REST API using TensorFlow Serving and Flask.

All the materials that are used for the emotion recognition deployment process are here.

Cloud instance

The first step of the deployment process is to choose the platform on which we want to deploy. The two main options are cloud instance and in-house machine. The pros and cons of each are discussed in Overview of Computing Infrastructure. Here we use a cloud-based approach and deploy the model in a Microsoft* Azure cloud. Microsoft provides new users with a USD 200 free trial subscription, which is enough for a couple of weeks of 24/7 work of a small-size CPU instance.

The following step-by-step instructions show how to launch an appropriate instance in cloud.

  1. Go to the Azure home page, and then create an account (skip this step if you already have an account). During the registration process you’ll be asked to provide a credit card number, but you won’t be charged for the free trial subscription.
  2. After creating your account, go to the Azure portal, which is the main site for this cloud service. Here you can manage all the resources including virtual machines (cloud instances). Let’s create.

    a virtual machine by clicking Virtual machines on the left panel. We want to create and add a new virtual machine that will run Ubuntu Server. The lightweight

    Linux* Server distribution is a good choice to use for deploying with TensorFlow Serving. In particular, we want to run Ubuntu Server 16.04 LTS. LTS stands for Long-Term Support, which means that this version is stable and will be supported for 5 years by Canonical.

  3. To start the process of machine configuration, click Create.
  4. On the Create virtual machine screen, make sure Subscription is set to Free Trial. For the other settings, we recommend you use the ones shown in the following screenshot.

  5. Next, you need to choose the size of the machine, which is the hardware that is set to run your instance. Choose the small CPU instance DS12_V2 with four Intel® Xeon® processors, 28 GB RAM, and 56 GB SSD. This setting should be enough for our small-size deployment. In some cases, you may need to change the location to see the desired machine available. For example, East US worked for one tutorial test reader, while West US worked for another.

    The Settings and Summary sections are not applicable for us at this time and can be skipped.

    After configuring the machine, the deployment process starts, which might take several minutes. When the process completes, you should see the new instance running when you go to Azure portal.

    By default there are no ports opened at the machine, except for port 22 for the SSH connection to the instance. Since we are building a web service, we need to open a few more ports to be able to see the machine from the Internet.

  6. To edit the network settings, click Emo-nsg (which is Network Security Group) in the All resources panel.
  7. In the Emo-nsg Settings tab, click Inbound security rules.

    The first one is for the Jupyter notebook* running at port 8888. The second one is for our Flask web service running at port 9000.

  8. The virtual machine is ready. Click the Running Instance icon (see screenshot shown earlier in this section) to see its running state and IP address to connect.

Setting up the Docker* environment

We just launched the cloud machine running pristine Ubuntu Server 16.04 LTS. Installing all the dependencies there from scratch would take a considerable amount of time. Fortunately, the Docker* container technology is available, which allows you to wrap all the dependencies into one file that can be deployed quickly at any machine. See our article on Docker for more details.

To benefit from using container technology, we need to install the Docker engine at the first place. But first, you need to download the archive with the materials to your laptop.

  1. Copy the archive with the materials to the cloud instance over SSH using the scp command.
  2. Connect to the cloud virtual machine via SSH. It should have the archive, which we just copied in the home folder.
  3. Unarchive it. All the scripts and the code that are needed for the deployment are in the home

    directory now. You might have a .zip archive instead, if so, just use unzip command. To install unzip, run sudo apt install unzip, then unzip <name_of_archive.zip>

  4. Install the Docker engine using the install_docker.sh script (run: sudo ./install_docker.sh).

    Under the hood it just repeats the commands from the official tutorial.

  5. Build the Docker image called “emotions” from Dockerfile, which is a kind of manifest for the system that will be inside Docker container.

    (run: sudo docker build -f Dockerfile -t emotions)

    The same can be done with the build_image.sh script (run: sudo ./build_image.sh). The building process might take a while (about 5‒10 minutes). After that you should be able to see the newly built image.

  6. Launch the Docker container from the just-built emotions image. Here we run it with several options. We want to map the instance home folder /home/johndoe into the container

    /root/shared folder, which effectively works like a shared folder. We also want to forward all the requests addressed to the instance ports 8888 (Jupyter) and 9000 (Flask) into the container to the same ports. It allows all the servers and services to run inside the Docker container and also to have access to them from the Internet.

    Using the -d option (which means detach) runs the container in the background.

    sudo docker run -d -v /home/johndoe:/root/shared -p 8888:8888 -p 9000:9000 emotions

Prepare the Keras model

We have our Docker container running on the cloud CPU instance. And we also have saved the Keras model for emotion recognition from images earlier (see basic and advanced articles on CNNs). The next step is to convert the Keras model into format which is appropriate for TensorFlow Serving.

  1. Go inside the Docker container using the exec command, the container ID or Names (in my case it’s 88178b94f61c or fervent_lamarr, which can be seen from the sudo docker ps -a command), and the name of the program to run (/bin/bash in this case, which is the usual shell).

    sudo docker exec -it 88178b94f61c /bin/bash

  2. Go to the deployment folder, which contains the scripts and tools for deployment.
  3. Run the serve_model.py Python* script, which converts the Keras model into a suitable TF Serving format. First we convert the baseline model and assign it to version 1. python serve_model.py --model-path ../models/baseline.model --model-version 1

    Next we convert the advanced model and set its version to 2. The TensorFlow Serving handles the versioning automatically.

    python serve_model.py --model-path ../models/pretrained_full.model --model-version 2

    The core part of this script loads the Keras model, builds information about the input and output tensors, prepares the signature for the prediction function, and then finally compiles these things into a meta-graph, which is saved and can be fed into the TensorFlow Serving.

TensorFlow Serving server

Now we’re ready to launch the TensorFlow Serving server. Serving is a set of tools that allows you to easily deploy TensorFlow models into production.

  1. Start the TensorFlow Serving server. tensorflow_model_server --port=9001 --enable-batching=true --model_name=emotions --model_base_path=../models &> emotions.log &

    The core parameters to specify are the port on which the TensorFlow Serving will be running, model name, and model path. We also want to run the process in the background and store the logs in the separate emotions.log file. The same thing can be done with the serving_server.sh script.

  2. Now the TensorFlow Serving server is running. Let’s test it using a simple client without a web interface. It takes the image from the specified path and sends it to the running TensorFlow Serving server. This functionality is implemented in the serving_client.py Python script.

    python serving_client.py --image-path ../Yoga3.jpg

    It works!

Flask server

The final step is to build a web service on top of TensorFlow* Serving. Here we use Flask as a back-end and build a simple API using the REST protocol.

  1. Run the flask_server.py python script. It launches the Flask server, which transforms the corresponding POST requests into requests of proper form to TensorFlow Serving. We run this script in the background and store the logs in the flask.log file.

    python flask_server.py --port 9000 &> flask.log &

    The main idea of the code is to define a so-called “route,” which redirects all POST queries to the predict page of the corresponding Flask server to the predefined predict function.

  2. Let’s test our Flask server now. First of all, from inside the Docker. curl '' -X POST -F "data=@../Yoga3.jpg"

    Then from outside the Docker but from the same cloud instance.

    curl '' -X POST -F "data=@./Yoga3.jpg"

    And finally from the outer network—our laptop. It works.

    Finally, we have a web service that works over REST API and can be accessed easily through the usual POST request with the special fields. You can take a look on API description here.

Remote Music Generation API


Create one more VM image, using steps 3‒7 in the “Remote Image Processing API Cloud Instance” section, and then connect to this image via ssh. All the following steps in this section happen within this VM image unless otherwise stated.

Make sure that you have Python 3 set up, and follow the instructions. To set up an additional package, the pip3 utility is required. However, pip3 is preinstalled with Python 3 since the version 3.4. Music21 package is necessary for music transformations. To install it, just run it in the console:

pip3 install music21

and pip3 will do all the work.

Installing and setting the emotion transformation part

You can find our ideas about emotional-based transformations in music here.

  1. Music-related files are placed in the music subfolder of the cloned repo. Copy this folder to your target machine using scp -r music user@host_ip:/home/user if you performed step 1 on your local machine. But no copying is needed if you perform git clone on the machine on which you are planning to deploy the music part.

The Music folder contains the following subfolders:

  • base_melodies. Contains the source base melodies
  • base_modulation. Contains the necessary Python scripts for melody emotion modulation
    • emotransform.py. Performs melody transformation
    • web-server.py. Wraps the transformation script in RESTful API and provides the basic http-server
  • transform_examples. Contains examples of already transformed melodies

To run the web server in your system environment, adjustments in the web-server.py file are required: 

  1. Change the HOST variable in the header of the file to the IP address of the machine on which you deploy the musical part of the application. Example (be sure to include the single quotation marks): HOST=''

    You can set it to ‘’, but only do this if you intend to run the whole system (all parts of the application) on a single machine. The HOST value will be used as part of the URL for the file transfer procedure, so it must be visible and correct for your network.

    Use your OS tools or the ifconfig utility to retrieve the IP address of your machine.

    You can leave PORT variable as it is, but if problems occurred during the start of the server and you saw an error like [Address already in use] in the console, try setting it to a different value.



    NOTE: The IP address and PORT must be in sync with the main app, since they send requests and get responses for the IP, PORT pair.

Installing and setting BachBot*

  1. Assuming that you have Docker already installed (if not, refer to the “Remote Image Processing API” section). Pull the image: docker pull fliang/bachbot:aibtb
  2. Connect to a new Docker image. docker run -d -v /home/johndoe:/root/shared --name bachbot -it fliang/bachbot:cornell

    You should see something like this:

  3. Check that Torch in the Docker image works well on your system. Type the following commands in the console, which will invoke the Torch interactive shell:
    sudo docker ps 
    	sudo docker exec -it YOUR_CONTAINERID /bin/bash

    and then:


    If it runs without problems, exit the shell with the exit command and go to the next step. Otherwise, please refer to the Troubleshooting section.

  4. To avoid a training process, you can download a pretrained model to your local machine, and then copy it to VM with the following command: scp pretrained_music_model.tar.gz johndoe@ip:~/

    Place the content of this archive into the /root/bachbot/ folder of the BachBot Docker image. Or just use the docker shared folder as you did with an image part:

    cp ../shared/pretrained_music_model.tar.gz pretrained_music_model.tar.gz

    Finally, unarchive it with

    tar -xvf pretrained_music_model.tar.gz

    This directory should contain six files:

    • checkpoint_5100.json 
    • checkpoint_5100.t7 
    • checkpoint_5200.json 
    • checkpoint_5200.t7 
    • checkpoint_5300.json
    • checkpoint_5300.t7
  5. Exit from the Docker image with the exit command.

For more details related to BachBot, please follow the official GitHub* repo.

Starting the music generation web server

To start the web server for the musical part, type in the console:

cd [working_directory]/intel-ai-developer-journey/music/base_modulation/
sudo python3 web-server.py

Where [working_directory] is the name of your working directory from the Installing and setting the emotion transformation part section.

You should see output like this:

That’s it! The music generation service is up and running. Finally, we have a web service that can be accessed with a usual POST request. You can take a look at the Systems APIs listed below.

Web App

Web server

The tutorial Flask app needs methods for each view and also one method for uploading images. Since the index page will provide only the upload form, let's implement the slideshow generation logic right in the show method. The show method calls two remote API to extract emotions and generate MIDIs with music (in the code below, the API calls are stubbed with placeholders for clarity of presentation).

def show_page(session):
    #check if music is already generated
    for x in range(1, 6):
        music_path = os.path.join(basedir, 'upload/' + session + '/' + str(x) + '.mid')
        if not os.path.exists(music_path):
            emotion = get_emotion(os.path.join(basedir, 'upload/' + session + '/' + str(x) + '.jpg'))
            generate_music(emotion, session, x)
    return render_template('show.html', session_name = session)

def generate_music(emotion, session, num):
    # remote api call

def get_emotion(file_path):
    # remote api call
    return 'happy'

You can get the full source code of the web app here in the /slideshow folder.

Web client

On the client side, you need to implement a slideshow with playing MIDI. Slideshow implementations can be found on CodePen. Here is one working version.

Web app deployment

To launch the server using python2, type:

cd intel-slideshow-music/slideshow
pip install -r requirements.txt
pip install requests
export FLASK_APP=app
flask run

Then in your browser, open http://localhost:5000/slideshow-music. You should see a web app running locally and accessing remote API services (you will have to change the IP addresses of the remote AI APIs for emotion recognition and music generation as defined in the sections above).

Congratulations! All of the parts of the Slideshow Music project are now complete, and we successfully put them together. The app should look just like this example of a live version.


In this article, we covered the deployment and integration aspects of the AI app development process. We used a lightweight Flask framework, Midi.js, and Dropzone.js for the front-end and web app server. For the back-end we:

  • Overviewed the process of the Keras model deployment in the cloud. We used the Microsoft Azure cloud, Docker, Tensorflow Serving library, and Flask web server. We also got the web service that works over REST API and can be accessed through a usual POST request with the special fields.
  • Launched the emotion-modulation part with Bachbot’s harmonization model based on Torch, Docker, and a simple Python web server.

All of you have photos evoking emotions that can be shared by means of music. Enjoy your deep learning app and don’t forget to stop your virtual machines before the free trial subscription runs out!

Play the movie made of uploaded images and with a computer-generated song in the background.   

System APIs

This section describes the APIs that were implemented for the demo. The system itself is made of three components: emotion recognition (images), music generation, and user interface. In turn, the music generation component contains two subcomponents --- the adjustment of the base song toward the emotion and the computer-assisted music generation.

This section describes the APIs that were implemented for the demo. The system itself is made of three components: emotion recognition (images), music generation, and user interface. In turn, the music generation component contains two subcomponents --- the adjustment of the base song toward the emotion and the computer-assisted music generation.

Emotion Recognition

emotion_recognition_model train(images, emotions)

Given an annotated collection of images (for each image, we have an emotion, which is present on the image; only one emotion per image is used in this demo).

emotion predict(image, emotion_recognition_model)

Given a trained emotion recognition model and a new image, assign probabilities to each of the emotion classes and select the most probable emotion. For example, for an image of a beach, the model could very likely predict the following distribution:

“Anxiety” : 0.01,
“Sadness” :  0.01,
“Awe” : 0.2 ,
“Determination” : 0.05, 
“Joy” : 0.3 , 
“Tranquility” : 0.4,

And the most probable emotion is

“Tranquility” : 0.4

The tranquility is the output of the image processing API.

Music Generation

base_song_modulated modulate(base_song, emotion)

Given a .MIDI file with a base song (e.g. “Happy Birthday to You…”) and an emotion, this method adjusts the scale, tonality, and tempo of the base song to fit the emotion. We call this process an emotion-based modulation. For example, if the emotion is “sad” the music will be in minor form, not loud, and not fast as compared to the case when the emotion is “joy” or “determination”.

music_generation_model train(songs)

Given a collection of songs in .MIDI format, train a sequence model that can predict a .MIDI note for the prefix of .MIDI notes. The model captures transition probabilities between .MIDI notes.

computer_generated_song generate_song(modulated_base_song, music_generation_model)

Given an emotion-modulated base song, which serves as a seed for the computerized music generation process, and a model trained to generate music, we produce a sequence of new .MIDI notes. For example, we can seed the generative process with a “Happy Birthday to You…” song modulated with the “Joy” emotion and complete it using a trained generative model toward the complete song.

User Interface

bool upload_image(image)

Uploads an image and returns the error code upon completion (True for success, False for fail). Used as part of the submission form.

int select_base_song(base_songs)

Select a song from a list of base songs to be modulated based on the emotions from uploaded images. Return the index of the selected base song. Used as part of the submission form.

true play(movie)

Appendix: Troubleshooting

If you experienced problems with the Torch setup in the Docker image for music generation (for example, the learning procedure didn’t start or Th failed to run any computation), update the Torch binaries within the Docker image. One way to do this is to follow the initial setup process, and then use some additional commands at the end:

cd /root/torch        
bash install-deps (it may take a significant amount of time)
luarocks install hdf5
luarocks install luautf8

Then, run the Th interactive shell and make sure that the following line works without any problem:


and then:


After that you can return to the main storyline.


Prev: Deep Learning for Music Generation - Implementing the Model  

View All Tutorials ›


Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.