Hands-On AI Part 6: Select an AI Computing Infrastructure

Published: 09/18/2017  

Last Updated: 09/18/2017

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

In the previous article, we discussed deep learning frameworks and selected TensorFlow* because it has Keras, has a flourishing developer community, provides strategic support from Google, has a version optimized for Intel® processors, and is simple to deploy.

In this article, we continue our discussion of the infrastructure aspects of the project and focus on the computing resources from Intel that can be used to train and execute deep learning models. Depending on your goals and resources, such as budget, time, and talent, different options might be appropriate. For example, if you have a large data set or a tight timeline, a multiple CPU cluster might be the right choice. However, if you are an independent developer experimenting with various deep learning frameworks or techniques, a single workstation with a multicore CPU might be sufficient. We also provide a comparative overview of existing computing resources for deep learning from Intel.

An Overview of the Circuit Design Paradigms

There are three primary circuit design paradigms or architectures:

  • CPU
  • Field-programmable gate array (FPGA)
  • GPU

We’ll discuss the first two items in detail. GPUs are outside the scope of this series of articles and won’t be covered.

The following are the characteristic features of CPU and FPGA circuit designs:

  • CPUs were designed to support the widest variety of workloads. Modern CPUs are highly parallel devices. For example, an Intel® Xeon Phi processor has up to 72 cores and supports up to 280 threads. The Intel Xeon Phi processor also has a special module that provides data parallelism at a fine grain, working on 512 bits of 16 single-precision floats or 32-bit integers at a time.
  • FPGA circuits can be optimized for a specific computational pattern (such as deep learning). For example, in May 2017, Google* announced1 its tensor processing units (TPU) and Microsoft* announced2 its Azure* Machine Learning Cloud using Intel® FPGAs optimized for AI.

The computational effectiveness and efficiency of different circuit designs vary from task to task. Initially CPUs, being the oldest, were not designed for huge amounts of vector or matrix multiplications, which deep learning requires. However, recent innovations from Intel, such as Intel® Many Integrated Core Architecture (Intel® MIC Architecture) and the vector processing unit module move CPUs to the forefront of parallel technologies.

At the same time, the most recent benchmarks from Intel3 and Google4 demonstrate that custom FPGAs achieve state-of-the-art performance for some deep learning tasks by taking into account the specifics of the computational task (matrix sparsity, low precision, and so on).

FPGAs allow you to achieve power efficiency and speed when a computational pattern is fixed, which is ideal for the inference stage in a deep learning project. Many Integrated Core CPUs based on Intel MIC Architecture help achieve efficiency and speed for an arbitrary computational pattern matching vector or matrix multiplication due to massive parallelism, which is perfect for neural network training.

Image of two workflows
Figure 1. Two different steps of data analysis workflow. The hardware option you choose depends on the workflow step.

Selecting a Computing Infrastructure

Following is a review of computing infrastructures and the one chosen for the sample project.

Introducing Intel’s Computing Infrastructure

From a practical point of view, when selecting a computing infrastructure, the options available will come from the following two models

  • In-house (on-premises) hardware
    • Single CPU
    • Multiple CPUs
  • Cloud
    • Multiple CPUs
    • FPGAs

Each element of this model has a set of secondary options:

  • CPU
    • Laptop (for example, Intel® Core™ i7-766U processor, 2.5 GHZ)
    • Workstation or server (for example Intel® Xeon® processor or Intel® Xeon Phi™ processor)
  • Cloud
    • Google Cloud*
    • Microsoft Azure Cloud*
    • Amazon Web Services*
    • Intel® Nervana™ cloud
    • IBM Watson Analytics Cloud*

Questions and Criteria for Comparison

The computing infrastructure selection criteria is based on a decision tree, and generates a set of questions at each split point. Starting from the root, the questions are:

  • (Main) Should we use our own hardware or pay for compute hours in the cloud?
  • Which CPU to use?
  • Do we need multiple CPUs? (if you want to train a model faster or have a large data set)
  • Which cloud should we use? (compare power and price, quality of documentation, SDK, and productivity)

The final choice is a path from the root to a leaf for the selection criteria. To find this path, use the following evaluation criteria and questions (partially coming from the previous article about the deep learning framework selection):

  • How long do you plan to work on deep learning? This project?
  • What is the size of your data set?
  • Do you plan to train a model from scratch or use a pretrained model?
  • What deep learning framework do you plan to use?
  • On-premises
    • Price
    • Performance and price
    • Do you have expertise with DevOps (cluster administration)?
  • Cloud
    • Cloud computing hour price? (Performance and price)
    • Quality of documentation?
    • Availability of community-contributed tutorials?
    • Are there any plans for large-scale deep learning project deployments?
  • How difficult it is to migrate to a new computing platform?
  • Do you plan to deploy a user-facing production application?
  • How quickly do you need initial results?
  • Do you plan to run several experiments in parallel?
  • Do you plan to tweak your model or are you comfortable using only a predictive API, such as one that classifies images using a default model provided by the cloud provider?

Additionally, for a research institution or academic team, the following question might be relevant: How do we justify the cost and budget for cloud computing resources in a grant proposal?


Budgeting (in-house hardware versus cloud CPU)

In-house hardware is a long-term investment and has the following parameters:

  • Requires up-front payment
  • Zero compute cost
  • Requires time to scale (buying and adding new compute resources)
  • Administration costs
  • Becomes a commodity within a few years due to technological progress

Cloud computing:

  • No upfront cost
  • Variable compute cost (pay-as-you-go)
  • Infinite scalability within minutes or hours
  • (Approximately) zero administration costs
  • Constantly evolves as the hardware infrastructure is updated by the cloud provider, staying competitive on the market

To explain what options is reasonable and when, we built a simple financial model spreadsheet.

Image of graph for on premise, vs. Cloud CPU for Deep Learning
Figure 2. On-premises versus cloud CPU.

According to the graph in Figure 2, from a financial point of view (not accounting for data set size, scalability, and so on) cloud is a viable option if you plan to use it 24x7 for less than four or five months. Otherwise, it makes sense to invest in your own compute infrastructure.


If you plan to use pretrained models, as we did in our sample project, your hardware requirements are modest. In our sample project, we decided to stick with one dedicated workstation with an Intel Xeon Phi processor. However, you can even try to retrain and fine-tune a model on your laptop.

If the size of your data set is greater than 100 GB, you will probably need to use a cloud with multiple CPUs. Typically, data scientists rent a powerful machine in a cloud, train a deep learning model, export the model, and then stop the machine. This approach is cost-effective.

Engineering Productivity

The easiest way to add AI to your project is to upload data to the cloud, train the provided default model on your data in a few clicks, and deploy it as an API, also in a few clicks. Choosing the cloud option is good for projects in the early stages or for people with minimal coding skills. Financial aspects aside, cloud is a better option as it enables fast experimentation, scalability, straightforward deployment, and minimizes administration efforts.

More likely, you will want to tweak your model or experiment with neural network architecture, hyperparameters, and so on. You’ll also want to run multiple experiments in parallel. Cloud is a good option for this too since you can start multiple machines with the same configuration and hence experiment faster. If you don’t own your own cluster, which is costly, you will have to wait for one experiment to finish before you can start a new one on a single machine.

However, fast experimentation comes at a cost—you have to admin a new environment for each new machine and parallel experiment. Luckily, you can abstract your experimentation environment for deep learning and optimize the administration efforts with Docker*, a container technology, by packaging all the dependencies in a layered portable executable file. We recommend you always work using Docker since you can easily switch to a new compute infrastructure or share your work with others.

If you plan to deploy your model and expose it as an API, you might prefer to use a cloud and some cloud providers in particular. For example, Google announced TPUs optimized for TensorFlow and native support of deep learning models trained with TensorFlow (TPUs aren’t available yet in Google Cloud). Microsoft supports fast deployment of models trained with the Microsoft Cognitive Toolkit*. Cloud deployments are especially appropriate for teams without data engineers or DevOps since data scientists can execute the project from start to finish on their own. For middle-size or large-size deployments, standard cloud options might not be a suitable solution. You would need DevOps engineers to build distributed high-load system optimized for access by many users.

Our Choice

To decide what to use for our sample project, we focused on the following elements:

  • We have a small data set.
  • We plan to use TensorFlow.
  • We plan to use pretrained models.
  • We don’t have core DevOps people in a project team and prefer simple deployments.
  • We have a short project duration (1 to 3 months).
  • We would like to run parallel experiments, but time is not critical.
  • We plan to tweak deep learning models ourselves rather than use a fixed default model.
  • We don’t need the first results immediately and can wait a few days while the model trains.
  • We plan to “publish” a live demo application that is not designed for high traffic.

Based on these requirements and the evaluation described above, cloud is the right choice for us. We need to compare prices to select the best cloud provider (which we didn’t do in this project out of necessity.) Because our timeline is not tight—we can work on a single machine, run experiments sequentially, and wait while a model is training—we requested a single powerful workstation with the Intel® Xeon Phi™ processor. For deployment, we plan to use TensorFlow Serving5 in Google Cloud as it provides a fully managed service.


When selecting the right computing infrastructure, first decide whether you want to work in a cloud or use your own hardware. In most cases, cloud is a better option. Among all existing clouds, we think that Google Cloud is the best one if you are working with TensorFlow. It makes sense to invest in your own computing infrastructure if you plan to work on deep learning projects for more than six months and can accurately plan the required computational needs.


Additional Reading

  1. Google Annoucement
  2. Microsoft Announcement
  3. Benchmarks from Intel
  4. Benchmarks from Google
  5. TensorFlow Serving


Prev: Select a Deep Learning Framework Next: Augment AI with Human Intelligence Using Amazon Mechanical Turk*

View All Tutorials ›

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.