AI Practitioners Guide for Beginners—Part 4: AI in the Cloud
Learn about deploying the TensorFlow* framework on the Intel® Xeon® Scalable platform via the cloud.
Welcome back to the AI Practitioners Guide for Beginners, the video series. I'm Beenish Zia, and in this episode, I give you an overview of deploying the TensorFlow* framework on the Intel® Xeon® Scalable platform [sic] via the cloud.
The cloud platforms used in the guide are two public cloud service providers and are for demonstration purposes only. You can choose any cloud service provider that supports the Intel Xeon Scalable platform [sic]. Various cloud service providers, or CSPs, can be used to deploy AI workloads via the cloud. The guide uses Amazon Web Services*, or AWS*, for single-node deployment, and Google Cloud Platform [service]*, or GCP [sic], for multinode deployment. However, you can use a CSP of your choice.
Let's start with single-node deployment. If you've never used AWS, you will need to create an account before you can sign in to the AWS Management Console. Then, select the EC2 instance and configure it.
In the configuration step, you will select the base operating system you want to use for deep learning, and the instance type. The guide uses C5 instance to get CPU-optimized hardware and software.
After launching your instance and completing all necessary steps, you will connect to your instance. Once connected, you can run the TensorFlow framework using the Jupyter*Notebook or directly on the command terminal. For either of the methods, you can run the Intel® Optimization for TensorFlow* [with] Docker* images on your terminal or the notebook. For steps on how to do this, watch the previous episode in this series.
Once you have the framework running, you can get the TensorFlow benchmarks from GitHub* and run one or more of them. For example, for running the TensorFlow [convolutional neural network] CNN benchmark, you can use a command like this.
A point to note here is that your benchmarks need to be compatible with the version of TensorFlow you're running. For multnode deployment, I've used GCP [sic] for demonstration. If you've never used GCP [sic] before, you will have to sign in to your Google* account, create a GCP [sic] project, enable billing for your project, and enable the Cloud Machine Learning Engine and Compute Engine APIs.
Once that is done, you will need to set up authentication, install, and initialize the Cloud SDK. Next, you will need to set up the environment, which includes opening your GCP [sic] console and activating the cloud shell. Then you will verify the Google Cloud SDK components, followed by downloading the code for the example run.
Google hosts a public cloud storage bucket where you can get relevant training data. Once you have your training data, install dependencies. For running distributed training on GCP [sic], you will need to set up your cloud storage bucket, including bucket name and region, followed by uploading your data files to your cloud storage bucket.
Now that the whole platform is ready, you will start running your distributed training in the cloud. To do that, you will have to assign a job name and an output path to dump your results, as well as select scale-tier parameter to standard one, to use an all-CPU-based configuration.
Once the job has been submitted, you can monitor the progress on the GCP [sic] console. After training is completed, you can run inference in a similar manner.
The last step is to clean up your cloud storage to avoid incurring additional GCP [sic] charges. Please check out the guide in the links for complete details on deploying TensorFlow via the cloud. Thanks for watching, and keep developing.