# Hands-On AI Part 14: Image Data Preprocessing and Augmentation

Published: 10/13/2017

Last Updated: 10/13/2017

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

Preprocessing is the general term for all the transformation done to the data before feeding them into the model, including centering, normalization, shift, rotation, shear, and so on. Generally, there are two occasions when one might want to do preprocessing:

- Cleaning up the data. Let’s assume that you have some artifacts in the images. To make the learning process easier for the model, we can remove the artifacts using preprocessing.
- Augmenting the data. Sometimes small datasets are not enough for the deep model to learn sufficiently well. The data augmentation approach is useful in solving this problem. It is the process of transforming each data sample in numerous possible ways and adding all of the augmented samples to the dataset. By doing this one can increase the effective size of the dataset. Transformations to apply are usually chosen randomly from the predefined set.

Let’s take a look at some of the possible preprocessing transformations and see how they can be implemented via Keras*. All the materials including corresponding code, notebook, and Dockerfile* are located on Google Drive*.

## Data

In this and following articles we will use the image sentiment analysis dataset. It contains about 1,500 examples of images divided into two classes—positive and negative. Let’s take a look at some examples.

Figure 1. Negative examples.

Figure 2. Positive examples.

## Cleaning Transformations

Now, let’s take a look at the set of possible transformations that are usually applied for cleaning up the data, their implementation, and influence on images.

All the code snippets can be found in the Preprocessing.ipynb notebook.

### Rescaling

The images are usually stored in an RGB (Red Green Blue) format. In this format the image is represented as a three-dimensional (or three-channel) array.

Figure 3: RGB decomposition of the image. The diagram is taken from Wikiwand*.

One dimension is for channels (red, green, and blue colors) and two other dimensions are spatial dimension. Thus, every pixel is encoded through three numbers. Each number is usually stored as an 8-bit unsigned integer type (0 to 255).

**Rescaling** is an operation that moves your data from one numerical range to another by simple division using a predefined constant. In deep neural networks you might want to restrict your input to the range from 0 to 1, due to possible overflow, optimization, stability issues, and so on.

For example, let’s cast our data from [0; 255] range to [0; 1] range. Here and below we will use the Keras ImageDataGenerator class, which allows us to do all transformations on the fly.

Let’s create two instances of this class: one for transformed data and one for the initial (or default). We just need to specify the scaling constant. Moreover, the **ImageDataGenerator** class allows us to stream the data directly from the hard drive directory using the **flow_from_directory** method.

All the parameters can be found in the documentation, but the main parameters are the path to stream from and the target size of the image (the generator would just crop or pad the image if it doesn’t fit the target size). Finally, let’s get a sample from the generator and see the results.

Visually both images are identical, but that’s just because Python* tools rescale images automatically to the default range to be able to display them. Let’s take a look at the raw data, which are arrays. As one can see, raw arrays differ exactly by a factor of 255.

### Grayscaling

Another type of transformation that might be useful is grayscaling, which turns a color RGB image into images with only shades of gray representing colors. Conventional image processing might have used grayscaling in combination with consequent thresholding. This pair of transformations can throw away noisy pixels and detect shapes in the picture. Nowadays, all these operations are learned through convolutional neural networks (CNN), but grayscaling as a preprocessing step might still be useful. Let’s run that step in Keras with the same generator class.

Here, we create only one instance of the class but two different generators are taken from it. The second one sets the **color_mode** option to “**grayscale**” (while the default value is “**RGB**”).

### Samplewise Centering

We’ve already seen that raw data values are from 0 to 255. So, one sample is a 3D array of numbers from 0 to 255. Following the optimization stability considerations (get rid of vanishing or saturating values problems) we might want to normalize the dataset such that the mean value of each data sample would be equal to 0.

For that purpose, we need to calculate the mean value across one whole sample and subtract it from each number in it.

In Keras it can be done through the **samplewise_center** option. The results are shown below.

### Samplewise std normalization

This preprocessing step follows the same idea as samplewise centering, but instead of setting the mean value to 0, it sets the standard deviation value to 1.

Std normalization is controlled by the option **samplewise_std_normalization**. It’s worth mentioning that these two samplewise normalization options are often used simultaneously.

This transformation might be applied in deep learning models to improve the optimization stability by reducing the influence of the exploding gradients problem.

### Featurewise centering

In the previous two sections we used a normalization technique that looks at each particular sample at a time. There is an alternative approach to the normalization procedure. Let’s treat each number in the image array as a feature. Then, each image is represented by a vector of features. There are plenty of such vectors in the dataset; therefore, we can treat them coming from some unknown distribution. This distribution will be multivariate, and the dimension of the space will be equal to the number of features, which is width * height * 3. Although we don’t know the real distribution of the data we can try to normalize it by subtracting the mean value of the distribution. Note that here the mean value is the vector of the same dimension as space; that is, it is an image itself. In other words, we average across the dataset and not across one sample.

There is a special Keras option called **featurewise_centering**, but unfortunately as of August 2017 it had a bug in implementation; thus, let’s implement it ourselves. First of all, read the whole dataset into memory (it’s affordable because the dataset is small). We did it by setting the batch size to the size of the dataset. Now, let’s calculate the mean image across the dataset and, finally, subtract it from the test image.

### Featurewise std normalization

The idea behind featurewise standard deviation normalization is exactly the same as behind centering. The only difference is that we divide by the sample standard deviation instead of subtracting the mean value. The result again does not differ a lot visually. The same thing happened with rescaling, because featurewise std normalization is no more than rescaling but the normalization constant is adaptively calculated, while in rescaling one should specify it with hands. Note that the same idea of normalization across batches of data is the origin of the state-of-the-art deep learning technique called BatchNormalization.

## Augmentation Transformations

In this section, we’re going to discuss more data-dependent transformations, which explicitly use the graphical nature of data. These kinds of transformations are often used for data augmentation procedures.

### Rotation

This transformation rotates the image in a certain direction (clockwise or counterclockwise).

The parameter that allows the rotations is called **rotation_range**. It specifies the range of rotations in degrees from which the random angle will be chosen uniformly to do a rotation. Note that during the rotation the size of the image remains the same. Thus, some of the image regions will be cropped out and some of the regions of the new image will need to be filled.

The filling mode can be set up by the **fill_mode** parameter. It supports a variety of different ways for filling, but here we use constant just for the sake of the example.

### Horizontal shift

This transformation shifts the image to a certain direction along the horizontal axis (left or right).

The size of the shift can be determined using the **width_shift_range** parameter and is measured as a fraction of the total width.

### Vertical shift

It shifts the image along the vertical axis (up or down). The parameter through which we can control the range of shift is called the **height_shift** generator, and is also measured as a fraction of total height.

### Shearing

Shear mapping or shearing displaces each point in the vertical direction by an amount proportional to its distance from an edge of the image. Note that in general the direction does not have to be vertical and can be arbitrary.

The parameter that controls the displacement rate is called **shear_range** and corresponds to the deviation angle (in radians) between a horizontal line in the original picture and the image (in the mathematical sense) of this line in the transformed image.

### Zoom

This transformation zooms the initial image in or out. The **zoom_range** parameter controls the zooming factor.

For example, **zoom_range** is equal to 0.5 means that the zooming factor will be chosen from the range [0.5, 1.5].

### Horizontal flip

It flips the image with respect to the vertical axis. One can either turn it on or off using the **horizontal_flip** parameter.

### Vertical flip

It flips the image with regard to the horizontal axis. The **vertical_flip** Boolean parameter controls the presence of this transformation.

## Combination

Let’s try to apply all the described augmentation transformations simultaneously and see what happens. Recall that the parameters of each of the transformations are chosen randomly from the specified range; thus, we should have a considerably diverse set of samples.

Let’s initialize our **ImageDataGenerator** with all the available options turned on and test it on an image of a red hydrant. Note that previously we used constant filling mode just for better visualization. Now, we’re going to use a more elaborate filling mode which is called nearest; this mode assigns the color of the closest existing pixel to the pixel that should be blank.

## Conclusion

In this article, we described an overview of the common techniques of image preprocessing such as scaling, normalization, rotation, shifting, and shearing. We also demonstrated how these transformations can be implemented with Keras, and plugged in to the deep learning pipeline both technically (**ImageDataGenerator** class) and ideologically (data augmentation).

In the next article, we’re going to apply these techniques to build a baseline CNN model for image sentiment analysis.

Prev: Image Data Exploration |
Next: Overview of Convolutional neural Networks for Image Classification |

**IDEATION AND PLANNING**

Create Applications with Powerful AI Capabilities

Ideation

The Anatomy of an AI Team

Project Planning

**TECHNOLOGY AND INFRASTRUCTURE**

Select a Deep Learning Framework

Select an AI Computing Infrastructure

Augment AI with Human Intelligence Using Amazon Mechanical Turk*

Crowdsourcing Word Selection for Image Search

Data Annotation Techniques

Set Up a Portable Experimental Environment for Deep Learning with Docker*

**IMAGE DATA AND MODELING**

Image Dataset Search

Image Data Collection

Image Data Exploration

Image Data Preprocessing and Augmentation

Overview of Convolutional Neural Networks for Image Classification

Modern Deep Neural Network Architectures for Image Classification

Emotion Recognition from an Images Baseline Model

Emotion Recognition from Images Model Tuning and Hyperparameters

**MUSIC DATA AND MODELING**

Music Dataset Search

Music Data Collection and Exploration

Emotion-Based Music Transformation

Deep Learning for Music Generation: Choosing a Model and Preprocessing

Deep Learning for Music Generation: Implementing the Model

^{1}

#### Product and Performance Information

^{1}

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.