Game Dev with Unity* ML-Agents and Intel® Optimized Python* (Part Two)

Published: 07/06/2018  

Last Updated: 07/06/2018

Abstract render ball


In the final part of this two-part series on machine learning with Unity* ML-Agents, we will dig deeper into the architecture and create an ML-Agent from scratch. Before training, we will inspect the files that require parameters for machine learning to proceed. Finally, we will train the agent using Intel® optimized Python* and show how the completed system works.

Architecture of Unity* ML-Agents

Figure 1 shows the architecture of Unity ML-Agents:

Diagram Unity M L Agents
Figure 1. Unity* ML-Agents architecture.

At first glance, it might seem that the external communicator and Intel-optimized Python can only be used by the external brain, but this is not the case. The external brain can be accessed by other training modes, too.

Every scene will have two entities:

  1. An “Academy,” using an “Academy Script” that will be added later.
  2. “Brains,” which are the logic inside Unity ML-Agents where the main connection lies. Agents share the same brain; each agent has an agent script on it which links back to the brain. The brain itself has a brain script on it. It may or may not have a decision script.

Changes in V3 with Respect to V2

Unity ML-Agents have seen several changes, many based on community feedback. Some of the changes are described below:

  • The ML-Agents reward system changed to “AddReward()” or “SetReward().”
  • When we are working with an Agent and it has worked in its entirety or performed its function, we now use the “Done()” method.
  • The concept of state has been changed to observations, so “CollectStates()” have been replaced by “CollectObservations().”
  • When we collect Observations, we have to call “AddVectorObs()” with floats, integers, lists, and an array of floats, vectors, and quaternions. (Quaternions represent the orientation of every object in Unity.) The names of the inputs in the Internal Brain have been changed accordingly.
  • We must replace State with “Vector_Observation” and observation with “Visual_Observation.”

The table below summarizes the key changes in V3:

Old (V2) New (V3)
State Vector Observation
Observation Visual Observation
(New) Text Observation
Action Vector Action
(New) Text Action

Table 1. Changes in Unity* ML-Agents from v2 to v3.

Let’s Start with an Example

Use the following steps to start creating your own example of machine learning using Unity ML-Agents and Intel-optimized Python:

  1. Open up the Unity ML cloned project. Everything we do will be kept inside the Examples folder.

    Unity M L cloned project

    The cloned project is opened in Unity.

  2. Create a new subfolder named “MyBall” within the Examples folder. We will keep all of our resources within this folder.

    MyBall project

    The Examples folder is where we are keeping all the content and the resources.

  3. Create a new scene using the suggested name “MyBall(scene).”

    MyBall scene

    Next, we will create a new scene.

To start setting up machine learning inside the scene, we will have to create 3D objects, using the following steps:

  1. Create a 3D object cube.
  2. Add “rigid body” and make it “kinematic.”
  3. Change the color of the cube. For adding colors to our object, we need to create a new material and name it “Blue.” We will change the color content to blue. (We can also change the color of the background.)
  4. Create a 3D object sphere and add a rigid body to it.
    We will now organize the scene and add an event system from the UI.
  5. Right-click on “Hierarchy” then select “Event System.”

To follow the procedure for Unity ML-Agents, we need to separately create an Academy object and a brain object, and then associate the scripts properly. We will create an Academy object, then have a child object created from Academy named “Brain.” Within the brain, we will add the brain script; but when we do, we will notice an error in the inspector window, which we can quickly resolve.

Adding Functionality to the Academy and the Brain Object

When we add functionality to the Academy and Brain object by adding a C# script in it, we remove the error condition. The script follows a basic flow with some override methods. As we have created the ball Academy object, we can now create a C# script named “MyBallAcademy” and attach the script to the Academy in the hierarchy.

Before editing, the script looks like this:

using System. Collections;
using System.Collections.Generic;
using UnityEngine;

public class MyBallAcademy : MonoBehaviour {

	// Use this for initialization
	void Start () {
	// Update is called once per frame
	void Update () {

We will not inherit from monobehaviour, as we are not deriving any characteristics from it. After we change the script, everything will be derived from Academy and we don’t need “void Start()” and “void Update().”

using System. Collections;
using System.Collections.Generic;
using UnityEngine;

public class MyBallAcademy : Academy {

	// Use this for initialization
	public override void AcademyReset()


	public override void AcademyStep()

We have inherited from Academy and have declared two empty override methods as “AcademyReset()” and “AcademyStep().” We cannot change these methods, as this is the structure for any Academy script that you want to derive from. With both of these methods we have made the generalized script that can be used within the scene.

With the changes made to the script, we have a basic, bare-bones structure for linking Academy and the brain.

Basic Setup for the Scene

In this scene we will be creating a cube, which we will refer to as the “platform.” Within that platform, we will place a sphere, which will act like a ball. With movements, we can adjust the ball in order to prevent it from falling off the platform. If the ball falls off, the scene will reset, and we will restart the balancing act.

We now have our platform and the ball, but to demonstrate machine learning, we need to configure a brain to control the action. Once the system is under the control of the brain, it will drive the platform and then fire off an agent script. Our next job is to write the agent script.

Programming and Scene Setup Logic

We will now create an agent script and name it as MyBallAgent. We will inherit from the Agent. Once we add the MyBallAgent script to the system, we will immediately see what inherited values we need to put in. We will drag and drop Brain to the required inherited values.

First, we will drag and drop the MyBallAgent script created to the cube as shown below.

MyBallAgent script

Then we drag and drop the child we created for Academy as brain to the Brain option, which showed none (shown below).

Brain option

In the Agent code itself, we will write all the controlling parameters we intend to use. We will declare a GameObject “ball,” which we will include from the inspector that is ball.

public GameObject ball;

Now the flow of the agent is controlled by the Unity ML-Agents plugin. (We will not need Unity’s default update method.)


Overriding common methods.

We need to override common methods because the type of environment we created might require changes and more training. For that we need to change the values of the parameters and override the common values present.

First, we have to find out where we are going to have the transformations and other declarations for the game object. In version 0.3, game object changes have been shifted to “AddVectorObs,” which are now known as ”vector observations.”

For object transformation, positions, and rigid body, we are declaring eight AddVectorObs (also known as “vector objects”).

The method is called CollectObservations.

  AddVectorObs((ball.transform.position.x - gameObject.transform.position.x));
  AddVectorObs((ball.transform.position.y - gameObject.transform.position.y));
  AddVectorObs((ball.transform.position.z - gameObject.transform.position.z));

The complete method is shown below.

public override void CollectObservations()
  AddVectorObs((ball.transform.position.x - gameObject.transform.position.x));
  AddVectorObs((ball.transform.position.y - gameObject.transform.position.y));
  AddVectorObs((ball.transform.position.z - gameObject.transform.position.z));
  SetTextObs("Testing " + gameObject.GetInstanceID());


Here is what the above code does:

  1. We get the x and z rotation; the game object will rotate in two directions.
  2. We get the difference between the ball’s x position and the game object’s x position.
  3. We get where the ball is respective to the platform.
  4. We get the ball’s velocity in x,y and z directions.

When the Game Resets, What Method Will We Override?

The override method that we will be using for when the game resets is AgentReset(), which initiates when the ball is dropped onto the platform. Here are some of the key instructions:

  1. Reset everything back to zero:

    gameObject.transform.rotation = new Quaternion(0f, 0f, 0f, 0f);

  2. Change the velocity of the ball back to 0:

    ball.GetComponent().velocity = new Vector3(0f, 0f, 0f);

  3. Set the position of the ball back to StartPos:

    ball.transform.position = ballStartPos;

  4. Create “Vector3” to store the vector’s start position:

    Vector3 ballStartPos;

  5. Configure the starting position by working inside “Void Start()” and declaring the following:

    ballStartPos = ball.transform.position;

We have now defined the starting environment when we hold the ball for the very first time, and when the system resets.

Controlling the Platform

Once we shift to the “Player” option, we must enable certain keys on the keyboard to control movement. We accomplish this by creating a way to physically control the platform. This is where all the actions get converted, and for any desired change for the scene that we have created the response that we do by giving the keyboard movements should produce the results in the scene for the movement of the ball. We need to check as we map the keyboard keys to ensure that it is reflecting the same way that it is supposed to be. The entire updated code for MyBallAgent is shown below:

using System. Collections;
using System.Collections.Generic;
using UnityEngine; 

public class MyBallAgent : Agent {

public GameObject ball;
Vector3 ballStartPos;

void Start()
    ballStartPos = ball.transform.position;


public override void AgentAction(float[] vectorAction, string textAction)

        if (brain.brainParameters.vectorActionSpaceType == SpaceType.continuous)
            float action_z = 2f * Mathf.Clamp(vectorAction[0], -1f, 1f);
            if ((gameObject.transform.rotation.z < 0.25f && action_z > 0f) ||
                (gameObject.transform.rotation.z > -0.25f && action_z < 0f))
                gameObject.transform.Rotate(new Vector3(0, 0, 1), action_z);
            float action_x = 2f * Mathf.Clamp(vectorAction[1], -1f, 1f);
            if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
                (gameObject.transform.rotation.x > -0.25f && action_x < 0f))
                gameObject.transform.Rotate(new Vector3(1, 0, 0), action_x);


        if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
            Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
            Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)



public override void CollectObservations()
  AddVectorObs((ball.transform.position.x - gameObject.transform.position.x));
  AddVectorObs((ball.transform.position.y - gameObject.transform.position.y));
  AddVectorObs((ball.transform.position.z - gameObject.transform.position.z));
  SetTextObs("Testing" + gameObject.GetInstanceID());


 public override void AgentReset()
 gameObject.transform.rotation = new Quaternion(0f, 0f, 0f, 0f);
 ball.GetComponent<Rigidbody>().velocity = new Vector3(0f, 0f, 0f);
 ball.transform.position = ballStartPos;


Simulation Using Keyboard Inputs

For a simulation using keyboard inputs with the brain type set as “Player,” we will need to configure the brain script. Because there are eight AddVectorObs, the parameter for Vector Observation space size would be eight, and space type is “continuous.” Make the changes in the Inspector window, shown below:

Configuring the brain script
Figure 2. Configuring the brain script in the Inspector window.

Now we can add continuous player actions to control keyboard inputs. There are four keys to map, so there are four continuous player elements: up-arrow, down-arrow, right-arrow, and left-arrow. The parameter values are the following:

Element 0
Key -> Up Arrow

Element 1
Key->Down Arrow

Element 2
Key->Right Arrow

Element 3
Key->Left Arrow

The keyboard mapping is shown in the figure below:

Keyboard mapping
Figure 3. Keyboard mapping for elements 0-3.

Now we will click on “Play” to test the scene under player settings and try to keep the ball in the platform using the keyboard arrows up, down, left, and right.

For training the model using Intel-optimized TensorFlow*, we need to keep the brain type set to “external” for the build.

The ball at the center
Figure 4. Play starts with the ball at the center of the platform.

As we have done before, we need to create the build for the project.

Selecting the scenes
Figure 5. Selecting the scenes and creating the project.

We have added the scene; now we will create the build and name it.

Naming and saving the scene
Figure 6. Naming and saving the scene.

Now that the executable has been created, we must train it using our Intel-optimized Python module. However, before training can start, there are some things to know about the “” file and the “trainer_config.yaml” file. The “” file contains certain details for running the training. The key parameters are declared in the config file. The main work of the “” file is to initialize general parameters such as run_id, fast_simulation, etc. and trigger the “trainer_config.yaml” file. We don’t have to make changes to the “” file; it has the format as shown below:

# # Unity ML Agents
# ## ML-Agent Learning

import logging

import os
from docopt import docopt

from unitytrainers.trainer_controller import TrainerController

if __name__ == '__main__':
    logger = logging.getLogger("unityagents")
    _USAGE = '''
      learn (<env>) [options]
      learn --help

      --curriculum=<file>        Curriculum json file for environment [default: None].
      --keep-checkpoints=<n>     How many model checkpoints to keep [default: 5].
      --lesson=<n>               Start learning from this lesson [default: 0].
      --load                     Whether to load the model or randomly initialize [default: False].
      --run-id=<path>            The sub-directory name for model and summary statistics [default: ppo]. 
      --save-freq=<n>            Frequency at which to save model [default: 50000].
      --seed=<n>                 Random seed used for training [default: -1].
      --slow                     Whether to run the game at training speed [default: False].
      --train                    Whether to train model, or only run inference [default: False].
      --worker-id=<n>            Number to add to communication port (5005). Used for multi-environment [default: 0].
      --docker-target-name=<dt>       Docker Volume to store curriculum, executable and model files [default: Empty].

    options = docopt(_USAGE)
    # Docker Parameters
    if options['--docker-target-name'] == 'Empty':
        docker_target_name = ''
        docker_target_name = options['--docker-target-name']

    # General parameters
    run_id = options['--run-id']
    seed = int(options['--seed'])
    load_model = options['--load']
    train_model = options['--train']
    save_freq = int(options['--save-freq'])
    env_path = options['<env>']
    keep_checkpoints = int(options['--keep-checkpoints'])
    worker_id = int(options['--worker-id'])
    curriculum_file = str(options['--curriculum'])
    if curriculum_file == "None":
        curriculum_file = None
    lesson = int(options['--lesson'])
    fast_simulation = not bool(options['--slow'])

    # Constants
    # Assumption that this yaml is present in same dir as this file
    base_path = os.path.dirname(__file__)
    TRAINER_CONFIG_PATH = os.path.abspath(os.path.join(base_path, "trainer_config.yaml"))

    tc = TrainerController(env_path, run_id, save_freq, curriculum_file, fast_simulation, load_model, train_model,
                           worker_id, keep_checkpoints, lesson, seed, docker_target_name, TRAINER_CONFIG_PATH)

The “trainer_config.yaml” file contains more important information. Some default parameters are already declared. The important ones are max_steps: 5.0e4. (The max steps are how many times we loop around and train the entire thing. For this scene it is 50,000 and is written as 5.0e4, which is 5 * 104. The value is default.) We can alter the value so that we can train the model more. The number of times the model is trained is known as “epochs.” Generally, one epoch cycle is known as one full training cycle on the set or, in this case, is the scene.

α- value or learning rate 3.0e-4

We can also override some values. We can override the value if we need to change the training times such that we can increase the number of max steps, so that the scene is trained more. This helps us for better machine-learning results. Within the file there are examples where the default brain script values have been overridden.

A small snippet of the “config.yaml” file is shown below:

    trainer: ppo
    batch_size: 1024
    beta: 5.0e-3
    buffer_size: 10240
    epsilon: 0.2
    gamma: 0.99
    hidden_units: 128
    lambd: 0.95
    learning_rate: 3.0e-4
    max_steps: 5.0e4
    memory_size: 256
    normalize: false
    num_epoch: 3
    num_layers: 2
    time_horizon: 64
    sequence_length: 64
    summary_freq: 1000
    use_recurrent: false

    normalize: false
    batch_size: 1024
    beta: 5.0e-3
    buffer_size: 10240

    max_steps: 5.0e4
    batch_size: 128
    buffer_size: 2048
    beta: 1.0e-2
    hidden_units: 256
    summary_freq: 2000
    time_horizon: 64
    num_layers: 2

Now we can start the training process. The following is the command we will use:

python mball2.exe --run-id=mball2 –train

As the process runs, the following details are populated:

(idp) C:\Users\abhic\Desktop\ml-agents\python>python mball2.exe --run-id=mball2 --train
INFO:unityagents:{'--curriculum': 'None',
 '--docker-target-name': 'Empty',
 '--help': False,
 '--keep-checkpoints': '5',
 '--lesson': '0',
 '--load': False,
 '--run-id': 'mball2',
 '--save-freq': '50000',
 '--seed': '-1',
 '--slow': False,
 '--train': True,
 '--worker-id': '0',
 '<env>': 'mball2.exe'}
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :

Unity brain name: Brain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: ,
2018-06-04 05:28:49.992671: I k:\tf_jenkins_freddy\] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
C:\<path>\conda\envs\idp\lib\site-packages\tensorflow\python\ops\ UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:unityagents:Hyperparameters for the PPO Trainer of brain Brain:
        batch_size:     1024
        beta:   0.005
        buffer_size:    10240
        epsilon:        0.2
        gamma:  0.99
        hidden_units:   128
        lambd:  0.95
        learning_rate:  0.0003
        max_steps:      5.0e4
        normalize:      False
        num_epoch:      3
        num_layers:     2
        time_horizon:   64
        sequence_length:        64
        summary_freq:   1000
        use_recurrent:  False
        summary_path:   ./summaries/mball2
        memory_size:    256
INFO:unityagents: Brain: Step: 1000. Mean Reward: 6.975. Std of Reward: 1.993.
INFO:unityagents: Brain: Step: 2000. Mean Reward: 9.367. Std of Reward: 3.598.
INFO:unityagents: Brain: Step: 3000. Mean Reward: 7.258. Std of Reward: 2.252.
INFO:unityagents: Brain: Step: 4000. Mean Reward: 7.333. Std of Reward: 3.324.
INFO:unityagents: Brain: Step: 5000. Mean Reward: 10.700. Std of Reward: 4.618.
INFO:unityagents: Brain: Step: 6000. Mean Reward: 7.183. Std of Reward: 1.750.
INFO:unityagents: Brain: Step: 7000. Mean Reward: 7.038. Std of Reward: 2.464.
INFO:unityagents: Brain: Step: 8000. Mean Reward: 6.400. Std of Reward: 1.561.
INFO:unityagents: Brain: Step: 9000. Mean Reward: 7.664. Std of Reward: 3.189.
INFO:unityagents: Brain: Step: 10000. Mean Reward: 7.333. Std of Reward: 2.236.
INFO:unityagents: Brain: Step: 11000. Mean Reward: 9.622. Std of Reward: 4.135.
INFO:unityagents: Brain: Step: 12000. Mean Reward: 10.938. Std of Reward: 1.323.
INFO:unityagents: Brain: Step: 13000. Mean Reward: 10.578. Std of Reward: 2.623.
INFO:unityagents: Brain: Step: 14000. Mean Reward: 11.986. Std of Reward: 2.559.
INFO:unityagents: Brain: Step: 15000. Mean Reward: 10.411. Std of Reward: 2.383.
INFO:unityagents: Brain: Step: 16000. Mean Reward: 10.925. Std of Reward: 2.178.
INFO:unityagents: Brain: Step: 17000. Mean Reward: 10.633. Std of Reward: 1.173.
INFO:unityagents: Brain: Step: 18000. Mean Reward: 11.957. Std of Reward: 3.645.
INFO:unityagents: Brain: Step: 19000. Mean Reward: 10.511. Std of Reward: 2.343.
INFO:unityagents: Brain: Step: 20000. Mean Reward: 10.975. Std of Reward: 2.469.
INFO:unityagents: Brain: Step: 21000. Mean Reward: 12.025. Std of Reward: 6.786.
INFO:unityagents: Brain: Step: 22000. Mean Reward: 10.538. Std of Reward: 1.935.
INFO:unityagents: Brain: Step: 23000. Mean Reward: 10.311. Std of Reward: 1.044.
INFO:unityagents: Brain: Step: 24000. Mean Reward: 9.844. Std of Reward: 1.023.
INFO:unityagents: Brain: Step: 25000. Mean Reward: 10.167. Std of Reward: 0.886.
INFO:unityagents: Brain: Step: 26000. Mean Reward: 10.388. Std of Reward: 1.628.
INFO:unityagents: Brain: Step: 27000. Mean Reward: 10.000. Std of Reward: 1.332.
INFO:unityagents: Brain: Step: 28000. Mean Reward: 10.322. Std of Reward: 1.240.
INFO:unityagents: Brain: Step: 29000. Mean Reward: 9.644. Std of Reward: 0.837.
INFO:unityagents: Brain: Step: 30000. Mean Reward: 10.244. Std of Reward: 1.606.
INFO:unityagents: Brain: Step: 31000. Mean Reward: 9.922. Std of Reward: 1.576.
INFO:unityagents: Brain: Step: 32000. Mean Reward: 10.200. Std of Reward: 1.060.
INFO:unityagents: Brain: Step: 33000. Mean Reward: 10.413. Std of Reward: 0.877.
INFO:unityagents: Brain: Step: 34000. Mean Reward: 10.233. Std of Reward: 1.104.
INFO:unityagents: Brain: Step: 35000. Mean Reward: 10.411. Std of Reward: 0.825.
INFO:unityagents: Brain: Step: 36000. Mean Reward: 9.875. Std of Reward: 1.221.
INFO:unityagents: Brain: Step: 37000. Mean Reward: 10.067. Std of Reward: 0.550.
INFO:unityagents: Brain: Step: 38000. Mean Reward: 9.660. Std of Reward: 0.759.
INFO:unityagents: Brain: Step: 39000. Mean Reward: 11.063. Std of Reward: 1.467.
INFO:unityagents: Brain: Step: 40000. Mean Reward: 9.722. Std of Reward: 0.989.
INFO:unityagents: Brain: Step: 41000. Mean Reward: 9.656. Std of Reward: 0.732.
INFO:unityagents: Brain: Step: 42000. Mean Reward: 9.689. Std of Reward: 0.839.
INFO:unityagents: Brain: Step: 43000. Mean Reward: 9.689. Std of Reward: 1.152.
INFO:unityagents: Brain: Step: 44000. Mean Reward: 9.570. Std of Reward: 0.593.
INFO:unityagents: Brain: Step: 45000. Mean Reward: 9.856. Std of Reward: 0.510.
INFO:unityagents: Brain: Step: 46000. Mean Reward: 10.278. Std of Reward: 1.219.
INFO:unityagents: Brain: Step: 47000. Mean Reward: 9.988. Std of Reward: 0.924.
INFO:unityagents: Brain: Step: 48000. Mean Reward: 10.311. Std of Reward: 0.788.
INFO:unityagents: Brain: Step: 49000. Mean Reward: 10.044. Std of Reward: 1.192.
INFO:unityagents:Saved Model
INFO:unityagents: Brain: Step: 50000. Mean Reward: 9.210. Std of Reward: 0.730.
INFO:unityagents:Saved Model
INFO:unityagents:Saved Model
INFO:unityagents:List of nodes to export :
INFO:unityagents:       action
INFO:unityagents:       value_estimate
INFO:unityagents:       action_probs
INFO:tensorflow:Restoring parameters from ./models/mball2\model-50000.cptk
INFO:tensorflow:Restoring parameters from ./models/mball2\model-50000.cptk
INFO:tensorflow:Froze 12 variables.
INFO:tensorflow:Froze 12 variables.
Converted 12 variables to const ops.

The bytes file is now generated in the /mball directory.

Directory contents the bytes file
Figure 7. Directory contents after generating the bytes file.

In our project inside the folder, there is no TFModels directory, so we will have to create one and keep the bytes file there.

Create the TFModels directory
Figure 8. Create the TFModels directory to store the bytes file properly.

After creating the bytes file, copy it to the \TFModels folder. Once that step is complete, go back to the Unity project and move to the Inspector window. Change the brain type to “internal.” It will show an error.

Brain to internal
Figure 9. After the bytes file is created, set the brain to “internal.”

We can now drag and drop the bytes file (inside the TFModels folder) corresponding to the Graph Model and resolve the error. The system is now ready to test to see how well the model has been trained.


Intelligent agents, each acting with dynamic and engaging behavior, offer promise for more realism and better user experiences. After completing the tasks described in part one and part two of this series, you can now create a Unity ML-Agent from scratch, configure the key learning and training files, and understand the key parameters to set up in order to get started with machine learning. Based on what you learned in these articles, you should now be able to incorporate more compelling AI behavior in your own games to boost immersion and attract players.


Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at