A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers
Preprocessing is the general term for all the transformation done to the data before feeding them into the model, including centering, normalization, shift, rotation, shear, and so on. Generally, there are two occasions when one might want to do preprocessing:
- Cleaning up the data. Let’s assume that you have some artifacts in the images. To make the learning process easier for the model, we can remove the artifacts using preprocessing.
- Augmenting the data. Sometimes small datasets are not enough for the deep model to learn sufficiently well. The data augmentation approach is useful in solving this problem. It is the process of transforming each data sample in numerous possible ways and adding all of the augmented samples to the dataset. By doing this one can increase the effective size of the dataset. Transformations to apply are usually chosen randomly from the predefined set.
Let’s take a look at some of the possible preprocessing transformations and see how they can be implemented via Keras*. All the materials including corresponding code, notebook, and Dockerfile* are located on Google Drive*.
In this and following articles we will use the image sentiment analysis dataset. It contains about 1,500 examples of images divided into two classes—positive and negative. Let’s take a look at some examples.
Figure 1. Negative examples.
Figure 2. Positive examples.
Now, let’s take a look at the set of possible transformations that are usually applied for cleaning up the data, their implementation, and influence on images.
All the code snippets can be found in the Preprocessing.ipynb notebook.
The images are usually stored in an RGB (Red Green Blue) format. In this format the image is represented as a three-dimensional (or three-channel) array.
Figure 3: RGB decomposition of the image. The diagram is taken from Wikiwand*.
One dimension is for channels (red, green, and blue colors) and two other dimensions are spatial dimension. Thus, every pixel is encoded through three numbers. Each number is usually stored as an 8-bit unsigned integer type (0 to 255).
Rescaling is an operation that moves your data from one numerical range to another by simple division using a predefined constant. In deep neural networks you might want to restrict your input to the range from 0 to 1, due to possible overflow, optimization, stability issues, and so on.
For example, let’s cast our data from [0; 255] range to [0; 1] range. Here and below we will use the Keras ImageDataGenerator class, which allows us to do all transformations on the fly.
Let’s create two instances of this class: one for transformed data and one for the initial (or default). We just need to specify the scaling constant. Moreover, the ImageDataGenerator class allows us to stream the data directly from the hard drive directory using the flow_from_directory method.
All the parameters can be found in the documentation, but the main parameters are the path to stream from and the target size of the image (the generator would just crop or pad the image if it doesn’t fit the target size). Finally, let’s get a sample from the generator and see the results.
Visually both images are identical, but that’s just because Python* tools rescale images automatically to the default range to be able to display them. Let’s take a look at the raw data, which are arrays. As one can see, raw arrays differ exactly by a factor of 255.
Another type of transformation that might be useful is grayscaling, which turns a color RGB image into images with only shades of gray representing colors. Conventional image processing might have used grayscaling in combination with consequent thresholding. This pair of transformations can throw away noisy pixels and detect shapes in the picture. Nowadays, all these operations are learned through convolutional neural networks (CNN), but grayscaling as a preprocessing step might still be useful. Let’s run that step in Keras with the same generator class.
Here, we create only one instance of the class but two different generators are taken from it. The second one sets the color_mode option to “grayscale” (while the default value is “RGB”).
We’ve already seen that raw data values are from 0 to 255. So, one sample is a 3D array of numbers from 0 to 255. Following the optimization stability considerations (get rid of vanishing or saturating values problems) we might want to normalize the dataset such that the mean value of each data sample would be equal to 0.
For that purpose, we need to calculate the mean value across one whole sample and subtract it from each number in it.
In Keras it can be done through the samplewise_center option. The results are shown below.
Samplewise std normalization
This preprocessing step follows the same idea as samplewise centering, but instead of setting the mean value to 0, it sets the standard deviation value to 1.
Std normalization is controlled by the option samplewise_std_normalization. It’s worth mentioning that these two samplewise normalization options are often used simultaneously.
This transformation might be applied in deep learning models to improve the optimization stability by reducing the influence of the exploding gradients problem.
In the previous two sections we used a normalization technique that looks at each particular sample at a time. There is an alternative approach to the normalization procedure. Let’s treat each number in the image array as a feature. Then, each image is represented by a vector of features. There are plenty of such vectors in the dataset; therefore, we can treat them coming from some unknown distribution. This distribution will be multivariate, and the dimension of the space will be equal to the number of features, which is width * height * 3. Although we don’t know the real distribution of the data we can try to normalize it by subtracting the mean value of the distribution. Note that here the mean value is the vector of the same dimension as space; that is, it is an image itself. In other words, we average across the dataset and not across one sample.
There is a special Keras option called featurewise_centering, but unfortunately as of August 2017 it had a bug in implementation; thus, let’s implement it ourselves. First of all, read the whole dataset into memory (it’s affordable because the dataset is small). We did it by setting the batch size to the size of the dataset. Now, let’s calculate the mean image across the dataset and, finally, subtract it from the test image.
Featurewise std normalization
The idea behind featurewise standard deviation normalization is exactly the same as behind centering. The only difference is that we divide by the sample standard deviation instead of subtracting the mean value. The result again does not differ a lot visually. The same thing happened with rescaling, because featurewise std normalization is no more than rescaling but the normalization constant is adaptively calculated, while in rescaling one should specify it with hands. Note that the same idea of normalization across batches of data is the origin of the state-of-the-art deep learning technique called BatchNormalization.
In this section, we’re going to discuss more data-dependent transformations, which explicitly use the graphical nature of data. These kinds of transformations are often used for data augmentation procedures.
This transformation rotates the image in a certain direction (clockwise or counterclockwise).
The parameter that allows the rotations is called rotation_range. It specifies the range of rotations in degrees from which the random angle will be chosen uniformly to do a rotation. Note that during the rotation the size of the image remains the same. Thus, some of the image regions will be cropped out and some of the regions of the new image will need to be filled.
The filling mode can be set up by the fill_mode parameter. It supports a variety of different ways for filling, but here we use constant just for the sake of the example.
This transformation shifts the image to a certain direction along the horizontal axis (left or right).
The size of the shift can be determined using the width_shift_range parameter and is measured as a fraction of the total width.
It shifts the image along the vertical axis (up or down). The parameter through which we can control the range of shift is called the height_shift generator, and is also measured as a fraction of total height.
Shear mapping or shearing displaces each point in the vertical direction by an amount proportional to its distance from an edge of the image. Note that in general the direction does not have to be vertical and can be arbitrary.
The parameter that controls the displacement rate is called shear_range and corresponds to the deviation angle (in radians) between a horizontal line in the original picture and the image (in the mathematical sense) of this line in the transformed image.
This transformation zooms the initial image in or out. The zoom_range parameter controls the zooming factor.
For example, zoom_range is equal to 0.5 means that the zooming factor will be chosen from the range [0.5, 1.5].
It flips the image with respect to the vertical axis. One can either turn it on or off using the horizontal_flip parameter.
It flips the image with regard to the horizontal axis. The vertical_flip Boolean parameter controls the presence of this transformation.
Let’s try to apply all the described augmentation transformations simultaneously and see what happens. Recall that the parameters of each of the transformations are chosen randomly from the specified range; thus, we should have a considerably diverse set of samples.
Let’s initialize our ImageDataGenerator with all the available options turned on and test it on an image of a red hydrant. Note that previously we used constant filling mode just for better visualization. Now, we’re going to use a more elaborate filling mode which is called nearest; this mode assigns the color of the closest existing pixel to the pixel that should be blank.
In this article, we described an overview of the common techniques of image preprocessing such as scaling, normalization, rotation, shifting, and shearing. We also demonstrated how these transformations can be implemented with Keras, and plugged in to the deep learning pipeline both technically (ImageDataGenerator class) and ideologically (data augmentation).
In the next article, we’re going to apply these techniques to build a baseline CNN model for image sentiment analysis.
|Prev: Image Data Exploration||Next: Overview of Convolutional neural Networks for Image Classification|
Create Applications with Powerful AI Capabilities
The Anatomy of an AI Team
Select a Deep Learning Framework
Select an AI Computing Infrastructure
Augment AI with Human Intelligence Using Amazon Mechanical Turk*
Crowdsourcing Word Selection for Image Search
Data Annotation Techniques
Set Up a Portable Experimental Environment for Deep Learning with Docker*
Image Dataset Search
Image Data Collection
Image Data Exploration
Image Data Preprocessing and Augmentation
Overview of Convolutional Neural Networks for Image Classification
Modern Deep Neural Network Architectures for Image Classification
Emotion Recognition from an Images Baseline Model
Emotion Recognition from Images Model Tuning and Hyperparameters
Music Dataset Search
Music Data Collection and Exploration
Emotion-Based Music Transformation
Deep Learning for Music Generation: Choosing a Model and Preprocessing
Deep Learning for Music Generation: Implementing the Model
TensorFlow Serving for AI API and Web App Deployment
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.