Use Machine Learning to Detect Defects on the Steel Surface

Published: 06/27/2018  

Last Updated: 06/27/2018


Project Overview

Surface quality is the essential parameter for steel sheet. In the steel industry, manual defect inspection is a tedious assignment. Consequently, it is difficult to guarantee the surety of a flawless steel surface. To meet user requirements, vision-based automatic steel surface investigation strategies have been proven to be exceptionally powerful and prevalent solutions over the past two decades1.

The input is taken from the NEU surface defect database2, which is available online. This database contains six types of defects including crazing, inclusion, patches, pitted surface, rolled-in-scale, and scratches.

Problem Statement

The challenge is to provide an effective and robust approach to detect and classify metal defects using computer vision and machine learning.

Image preprocessing techniques such as filtering and extracting the features from the image is a good training model solution from which we can determine which type of defect the steel plate has. This solution can even be used in real-time applications.


The evaluation is done using accuracy metrics. The following shows the accuracy of the system given:


Because the classes are balanced, accuracy is an appropriate metric to evaluate the project. The accuracy tells us about how well the algorithm is classifying the defects.


Data Exploration

The NEU surface dataset2 contains 300 pictures of each of six deformities (a total of 1800 images). Each image is 200 × 200 pixels. The images given in the dataset are in the .bmp format. The images in the dataset are gray-level images of 40.1 KB each. A few samples are shown in figure 1.

samples of defects
Figure 1. Samples of defects (a) crazing, (b) inclusion, (c) patches, (d) pitted surface, (e) rolled-in-scale, and (f) scratches.

Exploratory Visualization

The following chart shows the histogram of images per class.

histograms of sample defects
Figure 2. Histogram samples of defects: (a) crazing, (b) inclusion, (c) patches, (d) pitted surface, (e) rolled-in-scale, and (f) scratches.

An image histogram acts as a graphical representation of the tonal distribution in a digital image. The horizontal axis of the graph represents the intensity variations; the vertical axis represents the number of pixels of that particular intensity. A histogram gives us an idea of the contrast of the image that I used as a feature. It is important to observe the histogram of the image to get an overview of the feature, like contrast. From figure 2 it is observed that the histogram of each class is visually distinguishable, which makes contrast an important feature to be included in the feature vector.

As said earlier, the classes are well balanced, justifying accuracy as an evaluation metric.

Algorithms and Techniques

Different classifiers such as k-nearest neighbors (KNN), support vector classifier (SVC), gradient boosting, random forest classifier, AdaBoost (adaptive boosting), and decision trees will be compared.

Texture features such as contrast, dissimilarity, homogeneity, energy, and asymmetry will be extracted from the gray-level co-occurrence matrix (GLCM), and used for training the classifiers.


SVM is classified into linear and nonlinear. The linear SVM classifier is worthwhile to the nonlinear classifier to map the input pattern into a higher dimensional feature space. The data that can be linearly separable can be examined using a hyperplane, and the data that are linearly non-separable are examined methodically with kernel function, like a higher order polynomial. The SVM classification algorithm is based on different kernel methods; that is, radial basic function (RBF), and linear and quadratic kernel function. The RBF kernel is applied on two samples, x and x', which indicate as feature vectors in some input space and it can be defined as:


The value of the kernel function is decreased according to distance, and ranges between zero (in the limit) and one (when x = x').

Figure 3. Hyperplane in feature space.

AdaBoost algorithm

Input: Data set D = { (x1 , y1) ,( x2 , y2) ,......,(xm , ym) }

Base learning algorithm Ը; number of learning rounds T.


Initialize the weight distribution: D1(i) = 1/m.

for t = 1,...,T;

Train a learner ht from D using distribution Dt: ht= Ը(D,Dt);

Measure the error of ht: equation

If Et > 0:5 then break

Find weak classifier ht(x) using a perturbed empirical distribution: equation

Update the distributions, where Zt is the Normalization, which enables D(t+1) to be distributed


K-Nearest neighbor algorithm

  1. A value of K is defined (K>0), along with the new data sample.
  2. We select the K entries in our database that are near the new testing sample.
  3. We find out the most analogous classification of these entries.
  4. This is the classification we give to the new sample using the value of K.
  5. If the result is not adequate, change the value of K until the reasonable level of correctness is achieved.

Decision trees algorithm

  1. Create a root node for the tree.
  2. If all examples are positive, return leaf node ‘positive’.
  3. Else if all examples are negative, return leaf node ‘negative’.
  4. Calculate the entropy of the current state.
  5. For each attribute, calculate the entropy with respect to the attribute ‘x’.
  6. Select the attribute that has maximum value of information gain (IG).
  7. Remove the attribute that offers highest IG from the set of attributes.
  8. Repeat until we run out of all attributes or the decision tree has all leaf nodes.

Random Forest

Random forest is nothing but an ensemble of decision trees. It avoids the problem of over-fitting that is usually seen in decision trees where there is a single decision tree for the entire dataset.



I uploaded a basic model that uses the KNN algorithm to classify the images to GitHub* and achieves 75.27 percent accuracy. This will be the benchmark model on which I will try to improve the accuracy. The link is provided at the steel_plate repository.


Data Preprocessing

No preprocessing is used on the input images, as the defects of the steel plate heavily depend on the texture of its surface and, as we are using textural features, any preprocessing method such as smoothing or sharpening will change its texture.


The following flowchart represents the entire workflow of the project.

workflow chart
Figure 4. Project workflow

The project starts with loading the images and extracting texture features such as contrast, dissimilarity, homogeneity, energy, and asymmetry. The features with the label are then given to test the train split function that is already present in the sci-kit-learn library. The train-test split function splits data and labels. The data is split 80 for training and 20 percent for testing.

The 80 percent data was given for training different classifiers and the testing was done on 20 percent of the data. The model that gave the highest accuracy was then selected as the main model.

The GLCM feature extraction is given below:

GLCM is an example network used to discover the work of art drawing in an image by showing the surface as a gray-level variation of the two-dimensional array. The highlighting of GLCM is considered between the arrangement of the elements to portray the contrast of the pixels and the energy of the region of interest. GLCM is calculated in four directions: 0o, 45o, 90o, and 135o and for four distances: 1, 2, 3, and 4.

GLCM seems to be a recognized numerical technique for feature extraction. GLCM is a group of how often different combinations of pixel gray levels could come about in an image. A co-occurrence matrix depicts the joint gray-level histogram of the image (or a region of the image) in the form of a matrix with the dimensions of Ng*Ng.

Directional analysis graph
Figure 5. Directional analysis of GLCM.

The integer array specifies the distance between the pixel of interest and its neighbor. Each row in the array is a two-element vector, which specifies the relationship or displacement of a pair of pixels. Because offset is often considered to be an angle, the following table lists the offset values ​​that specify the common angles, given the pixel distance D.

formation of a GLCM matrix
Figure 6. Formation of GLCM matrix.

Features used in this method are as follows: contrast, dissimilarity, homogeneity, energy, and asymmetry.

Table 1. GLCM features.

Sr. No. Features Formulae
1. Contrast Contrast = equation
2. Homogeneity Homogeneity = equation
3. Dissimilarity Dissimilarity = equation
4. Energy Energy = equation
5. Asymmetry Asymmetry= equation

Gradient boosting is the combination of two methods; that is, the gradient descent method and AdaBoost. It builds a model in a forward fashion and optimizes the differential loss function. The algorithm is highly customizable for a specific application. AdaBoost has an advantage that it boosts the outliers near classification boundaries. It helps to increase the accuracy of the classifier.

The gradient boosting algorithm in detail is as follows:

Input: Training feature set {(Xi,Yi)}ni=1 loss function L(y, F(x)) and number of iterations.


  1. Initialize model with a constant value:equation
  2. For m=1,2….., M
    • Compute so-called pseudo-residuals:equation
    • Fit a base learner hm(x) to pseudo-residuals; that is, train it using the training set equation
    • Compute multiplier γm by solving the following 1D optimization problem:equation
    • Update the model:
    • Output: equation

Initially, smoothing or sharpening of the image was considered in preprocessing of the images. It was later observed that using the above preprocessing disrupts the textural features of the image, which has a negative impact on the output of the classifier. So, complication of preprocessing was solved, and as mentioned in the Data Preprocessing section, no preprocessing was used in this project.


The selection of algorithms and parameter tuning is an important aspect of machine learning. In this approach, the gradient boosting algorithm is selected, which is the combination of two machine learning approaches; that is, gradient descent and AdaBoost. AdaBoost algorithms boost the weak learners to minimize the false alarm and improve the accuracy. Boosting stages are finely tuned to get the promising accuracy.

In the gradient boosting model, the boosting factor (n_estimators) was tweaked to 80 (from the default value of 100).

Table 2. Hyperparameter values and accuracy.

n_estimators Accuracy (%)
80 92.5
90 92.22
100 91.6
110 91.38
500 90.00

The default value of n_estimators is 100, which gives 91.6 percent accuracy in the initial results. When the value of n_estimators is set to 80 the accuracy increases to 92.5 percent, which is our final result.


The following table shows the accuracy comparison of different classifiers:

Table 3. Performance evaluation.

Sr. No Classifier Accuracy (%)
1 KNN 75.27
2 AdaBoost 51.11
3 SVC 14.72
4 Decision Tree 88.33
5 Random Forest 89.44
6 Gradient Boosting 92.50

accuracy comparison graph
Figure 7. Accuracy comparison graph.

From the above table and graph we can observe that gradient boosting gives the highest accuracy of 92.5 percent. The confusion matrix of testing by using gradient boosting is given below.

As the extracted textural features are based on GLCM, variations in light intensities may negatively affect the result of the model.

Table 4. KFold CV results.

Sr. No Classifier CV accuracy (%) CV mean accuracy (%)
Folds= 5,
Random state=9
Random state=70
Random state=35
1 KNN 75.3472 74.930556 74.722222 74.99999
2 AdaBoost 49.375000 47.569 50.416667 49.12022
3 SVC 15.486111 14.305556 13.819444 14.53704
4 Decision Tree 84.861111 85.625000 85.069444 85.18519
5 Random Forest 87.013889 88.819444 87.083333 87.63889
6 Gradient Boosting 87.708333 88.611111 88.750000 88.35648

graph of confusion matrix
Figure 8. Confusion matrix.

From the confusion matrix of the gradient boosting classifier output it is seen that out of 360 testing images 333 are correctly classified and 27 are misclassified.


The gradient boosting classifier achieved an accuracy of 92.5 percent, which is more than the KNN benchmark model with 75.27 percent accuracy.

In KNN the data points at the boundaries of classes can be misclassified, and this is where the gradient boosting algorithm excels over KNN for this specific problem, as weak classifiers are transformed into strong classifiers.


In the proposed system, the machine learning-based steel plate defect detection system was implemented.

The input images were taken from the NEU dataset2, which is freely available.

No preprocessing was done, as mentioned in the Data preprocessing section.

The texture features were extracted by the GLCM, and extracted features were further classified into six respective classes (crazing, inclusion, patches, pitted surface, rolled-in-scale, and scratches) using different classification algorithms.

The test train split of the extracted features was done.

The gradient boosting classifier had the highest testing accuracy. Then, the hyperparameter of the boosting factor was tuned (which was difficult) to get even more accuracy, as mentioned in the refinement section. This approach achieved the classification accuracy of 92.5 percent.

In the future, this approach can be implemented using deep learning algorithms if the large dataset is available.

This was an interesting project, as this model can be implemented in real-life scenarios in the steel industry, which suffers from the problem of steel plate defects.

Intel® AI DevCloud Development Tools

Intel® AI DevCloud was used to train the network for the above implementation. Intel AI DevCloud is available for academic and personal research purposes for free and the request can be made from the Intel AI DevCloud website. The code developed can be found in this GitHub* repository.

Join the Intel® AI Developer Program

Sign up for the Intel® AI Developer Program and access essential learning materials, community, tools, and technology to boost your AI development. Apply to become an Intel® Student Ambassador and share your expertise with other student data scientists and developers.


  1. Yong Jie Zhao, Yun Hui Yan, Ke Chen Song, Vision-based automatic detection of steel surface defects in the cold rolling process: considering the influence of industrial liquids and surface textures, The International Journal of Advanced Manufacturing Technology, 2017, Volume 90, Number 5-8, Page 1665.
  2. NEU surface defect database
  3. GitHub

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at