An Example of a Convolutional Neural Network for Image Super-Resolution—Tutorial

Published: 06/30/2017  

Last Updated: 06/30/2017

By Alberto Villarreal Cueva

This tutorial describes one way to implement a CNN (convolutional neural network) for single image super-resolution optimized on Intel® architecture from the Caffe* deep learning framework and Intel® Distribution for Python*, which will let us take advantage of Intel processors and Intel libraries to accelerate training and testing of this CNN.

The CNN we use in this tutorial is the Fast Super-Resolution Convolutional Neural Network (FSRCNN), based on the work described in1 and2, who proposed a new approach to perform single-image SR using CNNs. We describe in more detail this network and its predecessor (the Super-Resolution Convolutional Neural Network (SRCNN)) in an associated article (“An Example of a Convolutional Neural Network for Image Super-Resolution”).

FSRCNN Structure

As described in the associated article and in2, the FSRCNN consists of the following operations:

  1. Feature extraction: Extracts a set of feature maps directly from the low-resolution (LR) image.
  2. Shrinking: Reduces dimension of feature vectors (thus decreasing the number of parameters) by using a smaller number of filters (compared to the number of filters used for feature extraction).
  3. Non-linear mapping: Maps feature maps representing LR patches to high-resolution (HR) ones. This step is performed using several mapping layers with filter size smaller than the one used in SCRNN.
  4. Expanding: Increases dimension of feature vectors. This operation performs the inverse operation as the shrinking layers in order to more accurately produce the HR image.
  5. Deconvolution: Produces the HR image from HR features.

The structure of the FSRCNN (56, 12, 4) model (which is the best performing model reported in2, and described in the associated article) is shown in Figure 1. It has a LR feature dimension of 56 (number of filters both in the first convolution and in the deconvolution layer), 12 shrinking filters (the number of filters in the layers in the middle of the network, performing the mapping operation), and a mapping depth of 4 (the number of convolutional layers that implement the mapping between the LR and the HR feature space).

Structure of the F S R C N N
Figure 1. Structure of the FSRCNN (56 ,12, 4)

Training and Testing Data Preparation

Datasets to train and test this implementation are available from the authors’2  website. The train dataset consists of 91 images of different sizes. There are two test datasets: Set 5 (containing 5 images) and Set 14 (containing 14 images). In this tutorial, both train and test datasets will be packed into an HDF5* file, which can be efficiently used from the Caffe framework. For more information about Caffe optimized for Intel® architecture, visit Manage Deep Learning Networks with Caffe* Optimized for Intel® Architecture.

Both train and test datasets need some preprocessing, as follows:

  • Train dataset: First, the images are converted to YCrCb color space, and only the luminance channel Y is used in this tutorial. Each of the 91 images in the train dataset is downsampled by a factor k, where k is the scaling factor desired for super-resolution, obtaining in this way a pair of corresponding LR and HR images. Next, each image pair (LR/HR) is cropped into a subset of small subimages, using stride s, so we end up with N pairs of LR/HR subimages for each one of the 91 original train images. The reason for cropping the images for training is that we want to train the model using both LR and HR local features located in a small area. The number of subimages, N, depends on the size of the subimages and the stride s. The authors of2, for their experiments define a 7x7 pixels size for the LR subimages, and a 21x21 pixels size for the HR subimages, which corresponds to a scaling factor k=3.
  • Test dataset: Each image in the test dataset is processed in the same way as the training dataset, with the exception that the stride s can be larger than the one used for training, to accelerate the testing procedure.

The following Python code snippets show one possible way to generate the train and test datasets. We use OpenCV* to handle and preprocess the images. The first snippet shows how to generate the HR and LR subimage pair set from one of the original images in the 91-image train dataset for the specific case where scaling factor k=3 and stride = 19:

import os
import sys
import numpy as np
import h5py

import cv2

# Parameters
scale = 3
stride = 19
size_ground = 19
size_input = 7
size_pad = 2

#Read image to process
image = cv2.imread('<PATH TO FILES>/Train/t1.bmp')

#Change color-space to YCR_CB
image_ycrcb = cv2.cvtColor(image, cv2.COLOR_RGB2YCR_CB)
image_ycrcb = image_ycrcb[:,:,0]
image_ycrcb = image_ycrcb.reshape((image_ycrcb.shape[0], image_ycrcb.shape[1], 1))

#Compute size of LR images and resize HR images to a multiple of scale
height_small = int(height/scale)
width_small  = int(width/scale)

image_pair_HR = cv2.resize(image_ycrcb, (width_small*scale, height_small*scale) )
image_pair_LR = cv2.resize(image_ycrcb, (width_small, height_small) )

# Declare tensors to hold 1024 LR-HR subimage pairs
input_HR = np.zeros((size_ground, size_ground, 1, 1024))
input_LR = np.zeros((size_input + 2*size_pad, size_input + 2*size_pad, 1, 1024))

height, width = image_pair_HR.shape[:2]

#Iterate over the train image using the specified stride and create LR-HR subimage pairs
count = 0
for i in range(0, height-size_ground+1, stride):
    for j in range(0, width-size_ground+1, stride):
       subimage_HR = image_pair_HR[i:i+size_ground, j:j+size_ground]
       count = count + 1
       height_small = size_input
       width_small  = size_input
       subimage_LR = cv2.resize(subimage_HR, (width_small, height_small) )

       np.lib.pad(subimage_LR, ((size_pad, 2), (2, 2)), 'constant', constant_values=(0.0))
       input_HR[:,:,0,count-1] = subimage_HR
       input_LR[:,:,0,count-1] = np.lib.pad(subimage_LR, ((size_pad, 2), (2, 2)), 'constant', constant_values=(0.0))

The next snippet shows how to use the python h5py module to create an hdf5 file that contains the HR and LR subimage pair set created in the previous snippet:

#Create an hdf5 file
with h5py.File('train1.h5','w') as H5:
    H5.create_dataset( 'Input', data=input_LR )
    H5.create_dataset( 'Ground', data=input_HR )

The previous two snippets can be used to create the hdf5 file containing the entire training set of 91 images to be used for training in Caffe.

FSRCNN Training

The reference model (described in the previous section) is implemented using Intel® Distribution for Caffe, which has been optimized to run on Intel CPUs. An introduction to the basics of this framework and directions to install it can be found at the Intel® AI Developer Program.

In Caffe, models are defined using protobuf files. The FSRCNN model can be downloaded from the authors’2 website. The code snippet below shows the input layer and the first convolutional layer of the FSRCNN (56, 12, 4) model defined by its authors2. The input layer reads the train/test data from the files whose filenames are defined in the source files located in the $HOME_CAFFE/examples directory (train.txt and test.txt). The batch size for training is 128.

name: "SR_test"
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "examples/FSRCNN/train.txt"
    batch_size: 128
  include: { phase: TRAIN }
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "examples/FSRCNN/test.txt"
    batch_size: 2
  include: { phase: TEST }

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  param {
    lr_mult: 0.1
  convolution_param {
    num_output: 56
    kernel_size: 5
    stride: 1
    pad: 0
    weight_filler {
      type: "gaussian"
      std: 0.0378
    bias_filler {
      type: "constant"
      value: 0

To train the above model, the authors of2 provide in their website a solver protobuf file containing the training parameters and the location of the protobuf network definition file:

# The train/test net protocol buffer definition
net: "examples/FSRCNN/FSRCNN.prototxt"
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 5000
# The base learning rate, momentum and the weight decay of the network.
#base_lr: 0.005
base_lr: 0.001
momentum: 0.9
weight_decay: 0
# Learning rate policy
lr_policy: "fixed"
# Display results every 100 iterations
display: 1000
# Maximum number of iterations
max_iter: 1000000
# write intermediate results (snapshots)
snapshot: 5000
snapshot_prefix: "examples/FSRCNN/RESULTS/FSRCNN-56_12_4"
# solver mode: CPU or GPU
solver_mode: CPU

The solver shown above will train the network defined in the model definition file FSRCNN.prototxt using the following parameters:

  • The test interval will be every 5000 iterations, and 100 is the number of forward passes the test should perform.
  • The base learning rate will be 0.005, and the learning rate policy is fixed, which means the learning rate will not change with time. Momentum is 0.9 (a common choice) and weight_decay is zero (no regularization to penalize large weights).
  • Intermediate results (snapshots) will be written to disk every 5000 iterations, and the maximum number of iterations (when the training will stop) is 1000000.
  • Snapshot results will be written to the examples/FSRCNN/RESULTS directory (assuming we run Caffe from the install directory $CAFFE_ROOT). Model files (containing the trained weights) will be pre-fixed by the string ‘FSRCNN-56_12_4’.

The reader is encouraged to experiment with different parameters. One useful option is to define a small maximum number of iterations and explore how the test error decreases, and compare this rate between different sets of parameters.

Once the network definition and solver files are ready, start training by running the caffe command located in the build/tools directory:

export CAFFE_ROOT=< Path to caffe >
$CAFFE_ROOT/build/tools/caffe train -engine "MKL2017" –solver \ $CAFFE_ROOT/examples/FSRCNN//FSRCNN_solver.prototxt 2>$CAFFE_ROOT/examples/FSRCNN/output.log

Resume Training Using Saved Snapshots

After training the CNN, the network parameters (weights) will be written to disk according to the frequency specified by the snapshot parameter. Caffe will create two files at each snapshot:


The model file contains the learned model parameters corresponding to the indicated iteration, serialized as binary protocol buffer files. The solver state file is the state snapshot containing all the necessary information to recover the solver state at the time of the snapshot. This file will let us resume training from the snapshot instead of restarting from scratch. For example, let us assume we ran training for 1 million iterations, and after that we realize that we need to run it for an extra 500K iterations to further reduce the testing error. We can restart the training using the snapshot taken after 1 million iterations:

$CAFFE_ROOT/build/tools/caffe train -engine "MKL2017" –solver\ $CAFFE_ROOT/examples/FSRCNN//FSRCNN_solver.prototxt –snapshot\ $CAFFE_ROOT/examples/FSRCNN/RESULTS/FSRCNN-56_12_4_iter_1000000.solverstate\ 2>$CAFFE_ROOT/examples/FSRCNN/output_resume.log

So, the new training will run until the new number of iterations specified in the solver file is reached, which in this case is 1500000.

FSRCNN Testing Using Pre-Trained Parameters

Once we have a trained model, we can use it to perform super-resolution on an input LR image. We can test the network at any moment during the training as long as we have model snapshots already generated.

In practice, we can use the super-resolution model we trained to increase the resolution on any image or video. However, for the purposes of this tutorial, we want to test our trained model in a LR image for which we have an HR image to compare with. To this effect, we will use a sample image from the test dataset that is used in1 and2 (from the Set5 dataset, which is also commonly used to test SR models in other publications).

To perform the test, we will use a sample image (butterfly) as the ground truth. To create the input LR image, we will blur and downsample the ground truth image, and will use it to feed the trained network. Once we forward-run the network with the input image, obtaining a super-resolved image as output, we will compare the three images (ground truth, LR, and super-resolved) to visually evaluate the performance of the SR network we trained.

The test procedure described above can be implemented in several ways. As an example, the following Python script implements the testing procedure using the OpenCV library for image handling:

	 import os
     import sys
     import numpy as np
     #Set up caffe root directory and add to path
     caffe_root = '$APPS/caffe/'
     sys.path.insert(0, caffe_root + 'python')
    import cv2
    import caffe
    # Parameters
    scale = 3
    #Create Caffe model using pretrained model
    net = caffe.Net(caffe_root + 'FSRCNN_predict.prototxt',
                      caffe_root + 'examples/FSRCNN/RESULTS/FSRCNN-56_12_4_iter_300000.caffemodel', caffe.TRAIN)
    #Input directories
    input_dir = caffe_root + 'examples/SRCNN/DATA/Set5/'
    #Input ground truth image
    im_raw = cv2.imread(caffe_root + '/examples/SRCNN/DATA/Set5/butterfly.bmp')
    #Change format to YCR_CB
    ycrcb = cv2.cvtColor(im_raw, cv2.COLOR_RGB2YCR_CB)
    im_raw = ycrcb[:,:,0]
    im_raw = im_raw.reshape((im_raw.shape[0], im_raw.shape[1], 1))
    #Blur image and resize to create input for network
    im_blur = cv2.blur(im_raw, (4,4))
    im_small = cv2.resize(im_blur, (int(im_raw.shape[0]/scale), int(im_raw.shape[1]/scale)))
    im_raw = im_raw.reshape((1, 1, im_raw.shape[0], im_raw.shape[1]))
    im_blur = im_blur.reshape((1, 1, im_blur.shape[0], im_blur.shape[1]))
    im_small = im_small.reshape((1, 1, im_small.shape[0], im_small.shape[1]))
    im_comp = im_blur
    im_input = im_small
    #Set mode to run on CPU
    #Copy input image data to net structure
    c1,c2,h,w = im_input.shape
    net.blobs['data'].data[...] = im_input
    #Run forward pass
    out = net.forward()
    #Extract output image from net, change format to int8 and reshape
    mat = out['conv3'][0]
    mat = (mat[0,:,:]).astype('uint8')
    im_raw = im_raw.reshape((im_raw.shape[2], im_raw.shape[3]))
    im_blur = im_blur.reshape((im_blur.shape[2], im_blur.shape[3]))
    im_comp = im_blur.reshape((im_comp.shape[2], im_comp.shape[3]))
    #Display original (ground truth), blurred and restored images

Running the above script on the test image displays the output shown in Figure 2. Readers are encouraged to try this network and refine the parameters to obtain better super-resolution results.

 Grayscale samples comparison of butterfly wing after FSRCNN
Figure 2. Testing the trained FSRCNN. The left image is the ground truth. The image in the center is the ground truth after being blurred and downsampled. The image on the right is the super-resolved image using a model snapshot after 300000 iterations.


In this short tutorial, we have shown how to train and test a CNN for super-resolution. The CNN we described is the Fast Super-Resolution Convolutional Neural Network (FSRCNN)2, which is described in more detailed in in an associated article (“An Example of a Convolutional Neural Network for Image Super-Resolution”). This particular CNN was chosen for this tutorial because of its relative simplicity, good performance, and the importance of the authors’ work in the area of CNNs for super-resolution. Several new CNN architectures for super-resolution have been described in the literature recently, and several of them compare their performance to the FSRCNN or its predecessor, created by the same authors: the SRCNN1.

The training and testing in this tutorial was performed using Intel® Xeon® processors, using the Intel Distribution for Caffe deep learning framework and Intel Distribution for Python, which is optimized to run on Intel Xeon processors.

Deep learning-based image/video super-resolution is an exciting development in the field of computer vision. Readers are encouraged to experiment with this network, as well as newer architectures, and test with their own images and videos. To start using Intel’s optimized tools for machine learning and deep learning, visit Intel® AI Developer Program.


  1. C. Dong, C. C. Loy, K. He and X. Tang, "Learning a Deep Convolutional Network for Image Super-Resolution," 2014.
  2. C. Dong, C. C. Loy and X. Tang, "Accelerating the Super-Resolution Convolutional Neural Network," 2016.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at