Optimize an R-FCN FP32 Inference Package with TensorFlow* for Kubernetes*

Published: 12/09/2020

Customer Reviews

☆☆☆☆☆ (0)  Rate this solution


Download Command

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/rfcn-fp32-inference-k8s.tar.gz

Description

The COCO validation dataset is used in these R-FCN quick start scripts. The inference quick start scripts use raw images, and the accuracy quick start scripts require the dataset to be converted into the TF records format. See the COCO dataset for instructions on downloading and preprocessing the COCO validation dataset.

Quick Start Scripts

Script name Description
fp32_inference Runs inference on a directory of raw images for 500 steps and outputs performance metrics.
fp32_accuracy Processes the TF records to run inference and check accuracy on the results.

Kubernetes*

Download and untar the R-FCN FP32 inference package.

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_3_0/rfcn-fp32-inference-k8s.tar.gz
tar -xvf rfcn-fp32-inference-k8s.tar.gz

The Kubernetes* package for RFCN FP32 inference includes serving and pipeline Kubernetes deployments. Within the serving and pipeline deployments are common use cases that include storage and security variations that are common across different kubernetes installations. Refrence the following directory tree within the model package, where serving and pipeline directories are below the mlops directory:

quickstart
└── mlops
    ├── pipeline
    │       ├── user-allocated-pvc
    │       └── user-mounted-nfs
    └── serving
            ├── user-allocated-pvc
            └── user-mounted-nfs

The pipeline example can be used to preprocess the coco dataset to get a TF records file and then run an R-FCN FP32 accuracy test using an Argo Workflow. Deployment of Argo needs to be done by devops.

The serving example uses a pod to run inference to get performance metrics (using raw images from the coco dataset) or test accuracy (when you already have the TF records file on NFS).

The deployments use Kustomize to configure parameters. The parameters can be set by running Kustomize commands prior to deploying the job to Kubernetes.

Prerequisites

The rfcn-fp32-inference-k8s.tar.gz package uses Kustomize v3.8.7 to configure parameters within the deployment.yaml. Kustomize v3.8.7 should be downloaded, extracted, and moved to a directory within your PATH. You can verify that you've installed the correct version of Kustomize by typing kustomize version. On OS X* you would see:

{Version:kustomize/v3.8.7 GitCommit:ad092cc7a91c07fdf63a2e4b7f13fa588a39af4f BuildDate:2020-11-11T23:19:38Z GoOs:darwin GoArch:amd64}
Serving Inference

Inference is run by submitting a pod yaml file to the k8s api-server, which results in the pod creation and then the specified quick start script is run in the pod's container.

Make sure you are inside the serving directory:

cd rfcn-fp32-inference-k8s/quickstart/mlops/serving

The parameters that can be changed within the serving deployment are shown in the table below:

NAME VALUE DESCRIPTION
DATASET_DIR /datasets input dataset directory
FS_ID 0 owner id of mounted volumes
GROUP_ID 0 process group id
GROUP_NAME root process group name
NFS_PATH /nfs NFS path
NFS_SERVER 0.0.0.0 NFS server
PVC_NAME workdisk pvc name
PVC_PATH /pvc pvc path
OUTPUT_DIR output output dir basename
USER_ID 0 process owner id
USER_NAME root process owner name

Note that when running inference, the DATASET_DIR should point to the directory of raw coco images (val2017) and when running accuracy testing, the DATASET_DIR should point to the TF records directory.

For the user-mounted NFS use case, change NFS_PATH and NFS_SERVER.

For the user-allocated PVC use case, change PVC_NAME and PVC_PATH.

For example: To change the NFS_SERVER address, run:

kustomize cfg set . NFS_SERVER <ip address> -R

To change the PVC_NAME, run:

kustomize cfg set . PVC_NAME <PVC Name> -R

In both use cases, change the following values so the pod deploys with the user's identity.

kustomize cfg set . FS_ID <Group ID> -R
kustomize cfg set . GROUP_ID <Group ID> -R
kustomize cfg set . GROUP_NAME <Group Name> -R
kustomize cfg set . USER_ID <User ID> -R
kustomize cfg set . USER_NAME <User Name> -R

Change the default namespace of all the resources by running the Kustomize command:

pushd <use-case>
kustomize edit set namespace <User's namespace>
popd

This will place all resources within the specified namespace. Note: this namespace should be created prior to deployment.

You can also change your default kubectl context by running:

kubectl config set-context --current --namespace=<User's namespace>

After you change parameter values, deploy the use case by running:

kustomize build  <use-case> > <use-case>.yaml
kubectl apply -f <use-case>.yaml
Serving Inference Output

Viewing the log output of the rfcn-fp32-inference-k8s.tar.gz is done by viewing the logs of the deployed pod. This pod is found by filtering the list of pods for the name 'inference':

kubectl get pods -oname|grep inference|cut -c5-

This can be combined with the kubectl logs subcommand to tail the output of the inference job:

kubectl logs -f $(kubectl get pods -oname|grep inference|cut -c5-)
Serving Inference Clean Up

Removing the pod and related resources is done by running:

kubectl delete -f <use-case>.yaml
Pipeline

The pipeline job uses an Argo Workflow to first convert the raw coco images to the TF records format and then runs R-FCN FP32 inference with an accuracy test using the TF records file.

The COCO validation 2017 dataset and annotations need to be downloaded to a directory on NFS. These will be used to create the TF records file.

Make sure you are inside the pipeline directory:

cd rfcn-fp32-inference-k8s/quickstart/mlops/pipeline

The parameters that can be changed within the pipeline are shown in the table below:

NAME VALUE DESCRIPTION
DATASET_DIR /datasets input dataset directory
FS_ID 0 owner id of mounted volumes
GROUP_ID 0 process group id
GROUP_NAME root process group name
NFS_PATH /nfs NFS path
NFS_SERVER 0.0.0.0 NFS server
PVC_NAME workdisk pvc name
PVC_PATH /pvc pvc path
OUTPUT_DIR output output dir basename
USER_ID 0 process owner id
USER_NAME root process owner name

For the user mounted NFS use case, the user should change NFS_PATH and NFS_SERVER.

For the user-allocated PVC use case, change PVC_NAME and PVC_PATH.

For example: To change the NFS_SERVER address, run:

kustomize cfg set . NFS_SERVER <ip address> -R

To change the PVC_NAME, run:

kustomize cfg set . PVC_NAME <PVC Name> -R

In both use cases, the user should change the values below so the pod is deployed with the user's identity.

kustomize cfg set . FS_ID <Group ID> -R
kustomize cfg set . GROUP_ID <Group ID> -R
kustomize cfg set . GROUP_NAME <Group Name> -R
kustomize cfg set . USER_ID <User ID> -R
kustomize cfg set . USER_NAME <User Name> -R

Change the default namespace of all the resources by running the Kustomize command:

pushd <use-case>
kustomize edit set namespace <User's namespace>
popd

This will place all resources within the specified namespace. Note: this namespace should be created prior to deployment.

You can also change your default kubectl context by running:

kubectl config set-context --current --namespace=<User's namespace>

After you change parameter values, deploy the use case by running:

kustomize build  <use-case> > <use-case>.yaml
kubectl apply -f <use-case>.yaml

Once the job has been submitted, the status and logs can be viewed using the Argo user inferface or from the command line using kubectl or Argo. The commands below describe how to use kubectl to see the workflow, pods, and log files:

kubectl get wf
kubectl get pods
kubectl logs <pod name> main
Clean Up

Remove the workflow and related resources using the following command:

kubectl delete -f object_detection.yaml

Troubleshooting

  • Pod doesn't start. Status is ErrImagePull.
    Docker recently implemented rate limits.
    See this note about rate limits and work-arounds.

  • Argo Workflow steps do not execute.
    Error from argo get is Failed to save outputs: Failed to establish pod watch: timed out waiting for the condition.
    This is due to the workflow running as non-root.
    Devops will need to change the workflow-executor to k8sapi as described in workflow-executors.

  • MpiOperator can't create workers. Error is '/bin/sh: /etc/hosts: Permission denied'. This is due to a bug in message passing interface (MPI) operator in the 'latest' container image when the workers run as non-root. See this issue.
    Use the container images: mpioperator/mpi-operator:v02.3 and mpioperator/kubectl-delivery:v0.2.3.


Documentation and Sources

Get Started
Main GitHub*
Readme
Release Notes
Get Started Guide

Code Sources
Report Issue


License Agreement

LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the “Software Package”), you agree to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party software included with the Software Package. Please refer to the license file for additional details.


Related Containers and Solutions

R-FCN FP32 Inference TensorFlow* Container
R-FCN FP32 Inference TensorFlow* Model Package

View All Containers and Solutions 🡢

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.