A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers
Here is a common scenario of why you should use a portable experimental environment: You start working on a new AI project on your laptop and configure your experimental environment. Faced with scalability issues because your laptop’s not powerful enough, you decide to migrate to the cloud. But your manager asks you to use a different cloud that is cheaper and supports faster computing units for training deep learning models. Then, your team adds a second data scientist and asks you to help configure the experimental environment on a new local dev machine. You repeat the configuration steps on that machine, install all the dependencies, clone the code from a repository, and then run the program. Nothing works, and you get a series of subtle warnings about the system architecture, outdated dependencies, and so on. Together with a slightly confused new team member, you resolve the issues on their machine. They commit the code, you pull it, and now the code breaks on your machine because everyone is using different versions of the framework. Sound familiar?
Each of the events in the above scenario is associated with the configuration of an experimental environment on a new machine. That process takes time and is not fun. Moreover, with time, the dependencies diverge.
In this article, we introduce the Docker* platform and describe how to use it to prepare an experimental environment for deep learning that can be easily ported to a new machine. We cover running deep learning processes within a Docker container. We also show how to train a simple neural network on a MNIST data set, which is a “Hello World” test for deep learning to make sure that everything works smoothly.
Docker supports numerous platforms. For our project, we focus on a Docker installation on macOS* (dev laptop) and on Ubuntu* (cloud workstation). There are subtle differences between the macOS and Linux* versions, but for our purposes these differences aren’t critical. We assume that you don’t have either Docker for macOS or Docker Toolbox installed yet. If you do, delete them first or skip to the next section.
- On Ubuntu, sometimes the repository configuration step returns an error that says the key-server is not found. This can be resolved by simply retrying the same command in a terminal window.
- On macOS, Docker is configured in such a way that you can type Docker commands without the sudo prefix and it starts with the system. On Linux, you have to enable it manually. Follow steps 1 and 2 to enable both features.
The following commands are the same for both macOS and Linux.
To make sure that you have correctly installed Docker, open a terminal and type:
docker run hello-world
This displays a "Hello, World!" image from Docker Hub, which is built and run locally. Since it is a dummy image, it only prints the "Hello, World!" message.
Docker has several basic workflows:
- Manage images and containers
- Download an existing image from Docker Hub
- List all
- Start a container (based on the images available locally)
- An image
- A container
- Build your own image
Let’s look at these one by one and introduce some additional concepts, where necessary.
Open the terminal, and then check that you have access to the Docker commands by running the "Hello, World!" container just as we did during the installation:
docker run hello-world
When you ran the command, Docker Engine
- Checked to see if you had the "Hello, World!" software image
- Downloaded the image from Docker Hub, if the image was not available locally
- Loaded the image into the container and ran it
- Exited upon success
If you want to download an existing image but don’t want to run it, type:
docker pull tensorflow/tensorflow:latest
Here, we tell Docker to pull an image created by TensorFlow. The image name is tensorflow and the image version is latest.
Next, let’s see all available images or containers (this image was cleaned up for the sake of this tutorial):
For now, there are only three images and no running containers. You can remove an image by name by typing the following command (an image can be removed if there is no container based on it):
docker rmi nginx
To start a container based on an image locally:
docker run -it -p 8888:8888 tensorflow/tensorflow
The tensorflow/tensorflow image will start a Jupyter Notebook* on localhost:8888.
Note: Jupyter is packed by the TensorFlow team into the tensorflow/tensorflow Docker image so that you don’t have to do it yourself.
We used the following parameters:
- -it is for interactive mode, to open a pseudo terminal.
- -p is for port mapping between a host operating system and an operating system within Docker. This is how applications inside the container interact with the real world. The first port is the host operating system port (Mac* laptop in our case). The second port is for the application within Docker. The default port for Jupyter Notebook is 8888.
Open a new terminal and list all containers again, which includes the TensorFlow container:
Docker Image for Deep Learning
In this section, we will build our own image for deep learning, thus, completing the list of key Docker workflows.
You can build your own Docker image two ways:
- From a Dockerfile
- By committing an existing Docker container at the end of an interactive session.
We prefer building from a Dockerfile because this method is more transparent and reproducible. In a single file, you declaratively define a set of packages to install and the commands to run upon the image start, build an image, and optionally tag it with a name. The latter version might be more familiar—you log in to the container, install packages as you normally would on your local machine using an interactive Docker shell, exit the container, and then commit it—but you don’t know exactly the constitution of your image. It is just an executable file that does the job, but you will most likely have a hard time rebuilding the same image if you lose it. Therefore, we will focus on the Dockerfile approach.
Defining a Dockerfile
The Dockerfile is based on the official tensorflow/tensorflow Dockerfile:
Step 1. Create a new folder with a file named Dockerfile inside it.
mkdir ai-docker cd ai-docker touch Dockerfile
Step 2: Every image in the Docker ecosystem starts from some base image. Typically, it is some version of a Linux distribution (in our case, Ubuntu 16.04). Open the Dockerfile with your favorite text editor (we like Atom*, which is free, and Sublime*) and add this:
Step 3: Define the maintainer:
MAINTAINER <insert maintainer name>
Step 4: Install some dependencies required for TensorFlow with the RUN command:
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends \ build-essential \ bzip2 \ curl \ g++ \ git \ libfreetype6-dev \ libpng12-dev \ libzmq3-dev \ pkg-config \ python \ python-dev \ rsync \ software-properties-common \ unzip \ wget \ && \ apt-get clean && \ rm -rf /var/lib/apt/lists/*
It executes the commands in the order listed. && ensures that all programs exit successfully, that is, the installation is correct.
Step 5: Rather than using Python*, which comes with Ubuntu, we’ll install Anaconda* and all the packages it contains to simplify our Docker configuration for deep learning:
RUN wget --quiet https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh -O ~/anaconda.sh && \ /bin/bash ~/anaconda.sh -b -p /opt/conda && \ rm ~/anaconda.sh && \ echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh ENV PATH /opt/conda/bin:$PATH
Using RUN, we downloaded the Anaconda distribution using wget (which in turn was installed inside this Docker at Step 3). Then we ran the installation bash script and removed the installation file.
Finally, since we always want Anaconda to be on PATH, we created a startup script that just appends the conda folder to PATH. ENV command is used to set such environment variables:
Step 6: Now, let’s install the key elements of the Docker image: TensorFlow and Keras*:
RUN pip install --upgrade pip RUN pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0rc2-cp27-none-linux_x86_64.whl && \ pip install Keras==2.0.4
Step 7: To enable Keras to visualize the architecture of the network right inside the Jupyter Notebook, let’s install some extra libraries:
RUN apt-get install -y graphviz && \ pip install pydot==1.1.0
Step 8: Copy sample notebooks inside the Docker image. First, create a folder inside the ai-docker folder with sample Jupyter Notebook and place the notebooks from the TensorFlow repository there.
You can use the DownGit tool to download a .zip archive with the subfolder from GitHub, and then move the archive to the ai-docker folder. Then, use the following Dockerfile command:
COPY notebooks /notebooks
COPY copies a folder from the local machine to the Docker image. The path is relative to the path with the folder, where you define the Dockerfile. In our case, it is ai-docker, and hence, the notebooks are accessible in ai-docker/notebooks.
Step 9: Expose the TensorBoard and Jupyter ports to the outer world and set a working directory to notebooks:
# TensorBoard EXPOSE 6006 # IPython EXPOSE 8888 WORKDIR "/notebooks"
Step 10: Define the command that gets called when the image is started:
CMD bash -c "jupyter notebook --port=8888 --ip=*"
We start the Jupyter Notebook on port 8888 and point it to the root/ directory. That ends the definition of the Docker image using the Dockerfile.
Building an Image
Now that we’ve defined a Dockerfile and supporting files (sample notebooks), let’s build a persistent portable Docker image based on it. Run the following command from the ai-docker folder:
docker build -t ai-docker-box .
Here, we name our new Docker image ai-docker-box . The “.” is important as it says to build the Docker image from the current directory. Docker then prints that your image has been built successfully.
If you made typos somewhere in the Dockerfile, you can run the docker build command again. The command skips the successful build steps and continues the building process from the place in a Dockerfile, where you introduced a change. That’s another great feature of Docker, which is possible due to its layered design.
Publishing an image on Docker Hub
- Create an account on https://hub.docker.com/.
- Log on to Docker from the terminal.
- Tag and push your image to the Docker Hub as follows (find the imageID using docker images; be patient while the image uploads to the Docker Hub):
docker tag 95a462d38fc0 YOUR_DOCKER_NAME/ai-docker-box:latest
docker push YOUR_DOCKER_NAME/ai-docker-box:latest
- Go back to your Docker Hub account and check that a new image is available.
- Remove this image locally, and then pull it and verify that it works just as before.
docker images docker rmi YOUR_DOCKER_NAME/ai-docker-box docker pull YOUR_DOCKER_NAME/ai-docker-box docker run YOUR_DOCKER_NAME/ai-docker-box
Open your browser on localhost:8888. You’re ready to use this Docker image.
Testing the Docker Image on MNIST
We will use the Docker image in a toy project, in which we train a neural network for digits classification from images. The code comes from the official TensorFlow MNIST tutorial (one of the notebooks that we downloaded earlier). Our goal is not to demonstrate the capabilities of TensorFlow or beat the MNIST benchmark with a fancy neural network architecture but to make sure that our Dockerized experimental environment works.
Start the Docker image that we have just built, navigate to Tutorial 3 (mnist_from_scratch), and then run all cells in this Jupyter Notebook. The third to last cell will perform 916 training steps. (On a MacBook Pro* without a Touch Bar, with an Intel® Core™ i7-U6660 processor and 16 GB RAM, it took about 5 minutes).
The last cell uses the trained model for scoring some hold-out data, and in our case achieves about a two percent error rate.
In this article, we introduced Docker, built a Docker image for deep learning, and tested it on a toy image classification problem. Rather than wasting time on repetitive tasks and to fully enjoy the power of Docker, you can simply get the Docker image that was prepared and run the corresponding Docker container on your machine. The environments are identical.
- Docker. 2017. Docker Documentation
- Docker. 2017. What is Docker
- Docker. 2017. Docker Overview
- Docker. 2017. Docker Desktop for Mac vs. Docker Toolbox
- Docker. 2017. What is a Container
- Docker. 2017. About storage drivers
- Medium. 2017. Docker Container’s Filesystem Demystified
- Docker Image
|Prev: Data Annotation Techniques||Next: Image Dataset Search|
Create Applications with Powerful AI Capabilities
The Anatomy of an AI Team
Select a Deep Learning Framework
Select an AI Computing Infrastructure
Augment AI with Human Intelligence Using Amazon Mechanical Turk*
Crowdsourcing Word Selection for Image Search
Data Annotation Techniques
Set Up a Portable Experimental Environment for Deep Learning with Docker*
Image Dataset Search
Image Data Collection
Image Data Exploration
Image Data Preprocessing and Augmentation
Overview of Convolutional Neural Networks for Image Classification
Modern Deep Neural Network Architectures for Image Classification
Emotion Recognition from an Images Baseline Model
Emotion Recognition from Images Model Tuning and Hyperparameters
Music Dataset Search
Music Data Collection and Exploration
Emotion-Based Music Transformation
Deep Learning for Music Generation: Choosing a Model and Preprocessing
Deep Learning for Music Generation: Implementing the Model
TensorFlow Serving for AI API and Web App Deployment
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.