# Change and Anomaly Detection Framework for Internet of Things Data Streams

Published: 06/17/2016

Last Updated: 06/17/2016

By Amitai Armon, Lev Faivishevsky, and Gilad Wallach

Internet of Things (IoT) is one of the main technological trends in the recent years. It allows real-time machine-to-machine communication over the internet. Nearly any device may transmit information from its sensors to enable centralized analysis and insight derivation in a designated cloud infrastructure.

In this paper we describe a generic analytics engine which provides robust alerts upon changes and anomalies in sensory data stream. Whereas most existing IoT analytic implementations require domain specific assumptions, we provide meaningful insights through Machine Learning techniques and advanced statistical tests with no prior knowledge.

The system is designed and implemented in a Big Data environment, using Hadoop, Spark and Akka in order to satisfy the requirements of IoT for fast online parallel information processing.

## 1. Introduction

In a nutshell, the Internet of Things refers to devices other than computers that are connected to the Internet and can send and receive data. The term was first coined by Kevin Ashton in 1999 in the context of supply chain management. However, in the past decade it became much broader and includes healthcare, automotive, energy, industrial, retail, smart buildings and homes, etc. According to (Middleton, Kjeldsen, & Tully, 2013), the Internet of Things will include 26 billion units installed by 2020. IoT product and service suppliers will generate incremental revenue exceeding $300 billion, mostly in services, in 2020.

The massive volume of IoT data generated by millions of sensors is extremely dynamic, heterogeneous, imperfect and unprocessed (Chen, 2012). Furthermore, it usually requires real-time analysis and decision making. Therefore, most existing implementations settle for basic analysis and statistics, or make many assumptions on the collected data, usually relevant to specific domains.

One of the major goals of IoT systems is automatic monitoring and detection of abnormal events, changes or drifts (Chui, Loffler, & Roberts, 2010). The traditional approach is to use a rules-based engine, which triggers alerts according to some manually configured thresholds. These systems lack data fusion and learning capabilities and therefore fail to cope with large amounts of complex high dimensional data.

Our analytics engine is generic, and capable of processing real time data streams from various parallel sensors. It is not domain specific, and does not require prior expert knowledge or assumptions. It provides meaningful insights regarding anomalous behavior and change over time. It consists of a multi-level analysis scheme (Figure 1) including time-series classification and modeling, anomaly detection, single and multi-sensor change detection, and alert characterization.

**Figure 1:** IoT Framework scheme: Our analytic engine consists of multiple layers including: sensor data ingestion and storage, time-series classification, anomaly and change detection, and alert characterization

## 2. Framework Details

**2.1 Time-Series Classification**

A common assumption in many time series models is that the underlying process is stationary. A stationary process has the property that its mean, variance, and autocorrelation structure do not change over time. This assumption allows for an elegant development of powerful statistical methodology.

IoT data may come from a large variety of domains, with significantly different types of properties and characteristics. For example, stationary processes are frequent in manufacturing signals, whereas daily periods of activity characterize human generated data in healthcare and wearable industries. Therefore, our analysis cannot rely on a specific data pattern, and should model it under no prior assumptions.

We classify data as either stationary, periodic, or neither. A signal is initially classified as stationary or not using a stationarity test. Those which are classified as non-stationary are further checked for being periodic. If a periodic signal is detected, it is transformed to a stationary signal by estimating its period and removing it from the data. The remaining signals which are neither, are classified as non-stationary and non-periodic.

**2.1.1 Stationarity Test**

While many estimation techniques for stationary processes were developed (Nason, 2013; Priestley & Rao, 1969), many of them require extensive computation and data transformations which are not applicable for online sensor streams typical for IoT applications. Therefore, we employ a two-sample Kolmogorov Smirnov test statistic to estimate a measure of deviation from stationarity (Kolmogorov, 1933).

A stationary process should have a time-shift invariant distribution. Therefore, we use pairs of samples from the same signal at different time windows, and use the two-sample Kolmogorov Smirnov test to check whether they originate from the same distribution. This implementation is extremely efficient, and easily handles multiple online data streams. Our algorithm showed similar accuracy compared to other state-of-the art methods such as Haar wavelet test for stationarity (Nason, 2013).

**2.1.2 Periodicity Detection**

A period may be defined as the smallest time shift that leaves a signal invariant. This definition describes a perfectly periodic signal, which is very rare in real data.

A periodicity detection algorithm is designed to determine whether a signal consists of an underlying periodic process, and estimates its fundamental period. This can be done either in the time domain, frequency domain, or a combination of both domains.

We implemented the YIN algorithm (De Cheveigné & Kawahara, 2002), which estimates the fundamental frequency of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that prevent common errors. It provides both a numeric threshold to define whether a signal is periodic or not, and an estimation of its period.

**2.2 Nonparametric Anomaly and Change Detection**

Change detection methods aim to detect consistent changes in a distribution of random variables over a certain time period (Desobry, Davy, & Doncarli, 2005). As discussed above, due to high variability of possible data patterns no prior parametric form can be assumed for sensor values distribution. We introduce an ensemble of tests for monitoring stationary and non-stationary time-series, including both non-parametric statistical tests and generic machine-learning models. The tests monitor both single sensors and a collection of a device’s sensors, and enable a user-controlled sensitivity level. Furthermore, they have an adaptive version, which adapts to changes in the data over time.

**2.2.1 Single-Sensor Change Detection**

The two-sample Kolmogorov Smirnov test (Kolmogorov, 1933) is applied to detect changes by comparing the empirical distributions of sensor values in short consecutive time-windows. It alerts when the signal before and after the change originate from a different underlying distribution, and provides the probability that these distributions are different. Since our method performs testing of multiple hypotheses (using a sliding window), we use the Bonferroni correction (Dunn, 1961) to avoid false alerts.

This algorithm has useful properties, such as providing a desired false alarm rate which may be easily tuned. Also, using sliding windows enables accurate detection of the exact outset of a change. However, the Kolmogorov Smirnov test may be applied only to one-dimensional random variables, hence it cannot be used in multisensory information processing directly.

**2.2.2 Multi-Sensor Change Detection**

In many use cases in IoT a device is equipped with several sensors, each of them is able to capture a specific characteristic of the environment. For instance, a manufacturing tool may contain a temperature sensor, a pressure sensor and a humidity sensor, whereas a wearable may be equipped with an accelerometer and a GPS sensor. In these situations, the ability to gain insights from multisensory information by analyzing the joint distribution of device sensors is beneficial. First, the multisensory approach enables amplifying the signal of a real phenomenon by taking into account multiple sources of information. Small changes in individual sensor values may be considered insignificant. However, analyzing such sensors together may reveal a significant event. Second, there are changes in the device behavior that cannot be detected in individual sensor levels. For example, a change in the correlation between two sensors of a device may indicate a significant change in the system behavior, but it cannot be identified when inspecting individual sensors only. An IoT change detection algorithm will therefore benefit from analyzing multisensory output.

Several recent multivariate change detection algorithms were suggested in literature. For example, Generalized Kolmogorov Smirnov Test (Glazer, Lindenbaum, & Markovitch, 2012), the online kernel change detection algorithm (Desobry et al., 2005), Maximum Mean Discrepancy (Gretton, Borgwardt, Rasch, Schölkopf, & Smola, 2007), and bootstrapping based change detection (Dasu, Krishnan, Venkatasubramanian, & Yi, 2006). These algorithms are computationally expensive, and require a large number of data points for the model training. Therefore, they are not appropriate for practical IoT implementations in the real-time parallel cloud architecture.

We implement a novel algorithm of Information Theoretic Multivariate Change Detection (ITMCD) (Faivishevsky, 2016) based on k-nearest neighbor (kNN) estimation. It is designed and implemented to satisfy the requirements of IoT for fast, online, parallel multisensory change detection. The method alerts about a change in a device behavior by monitoring multiple sensor values samples in a sliding time window of a fixed length in an online manner. If the multivariate joint distribution of sensors values in the beginning of the time window differs significantly from that in the end, then the method detects a change. To measure the difference between these multivariate distributions we apply nonparametric estimators of relative entropy, based on k-nearest neighbor calculations (Wang, Kulkarni, & Verdu, 2006). This method allows us to compute a threshold T that ensures a desired false alarm rate. The computational complexity of this algorithm is affected by finding the k-th nearest neighbor for each point in the time window, and is for a time window of length n (Vaidya, 1989).

**2.2.3 Non-Periodic Non-Stationary Multi-Sensor Change Detection**

Stationarity is an important property for time-series modeling and change detection. We have described that periodic signals can be transformed into stationary signals by detecting and cancelling out their period. However, some IoT signals may be highly non-stationary, and non-periodic, though still require accurate change detection. In order to model such signals we used One-Class SVM (Scholkopf, Williamson, Smola, Shawe-Taylor, & Platt, 1999), which aims at learning a single class characterizing the signals normal behavior, and considers new behaviors as change.

In order to characterize the normal behavior of a system we train the One-Class SVM algorithm over a long duration of past data we consider as normal behavior. The training phase is done offline, and models are built for both single sensors and all the device sensors combined. The online phase includes continuous classification of sensor data as either a change or not. Sensory time-series data is not modeled and tested in its raw form, instead we use its first and second order statistics over consecutive short time windows. A useful property of this algorithm, other than its good performance, is controlling the asymptotic fraction of data labeled as outliers by tuning the parameter.

**2.3 Alert Characterization**

An important feature in our framework is not only alerting on a change, but also providing insights about the cause of a change. Our system can potentially monitor many sensors in both single- and multi-sensor manner, and provide alerts based on complex tests that are not easily interpretable.

We implemented an ensemble of non-parametric statistical tests to point out the most significant reasons for the alert. This included which sensor or sensors had the most significant change, and what was its nature: a change in mean, variance or correlation. This enables users to take more educated decisions and investigate abnormal sensor behavior. Details are deferred to the full version of this manuscript.

## 3. Experiments

We performed a number of comparative experiments with our analytic engine, the main experiments are described below. First, we evaluated the performance of our stationarity test compared to the Haar wavelet test for stationarity (Nason, 2013). We experimented on real industrial data including 54 sensory signals collected from machines. Each data was classified as either stationary or not by both algorithms. The KS algorithm agreed on 92% of signals, where the remaining were ambiguous. This indicated that our algorithm has similar performance, while being faster and easier to implement.

Our next numerical simulations were dedicated to evaluation of the single sensor change detection performance of two-sample Kolmogorov Smirnov test (KS) and the Information Theoretic Multivariate Change Detection algorithm (ITMCD). We simulated a number of one dimensional Gaussian distributed sequences with zero mean and unit variance. At some time point the parameters of the distribution were changed either in terms of mean or in terms of standard deviation . The task was to detect the change with a predefined false alarm rate . A better change detection method should have a higher detection rate (E) and keep the false alarm rate (F) closer to in the run time. For ITMCD we fixed the future window size f = 10 and used a larger past time window size p = 50 for small , whereas a smaller past time window size p = 30 was used for a larger . The kNN parameter was set to k = 8. The results are summarized in Table 1 (Faivishevsky, 2016).

We observed that larger window sizes lead as expected to better performance of both KS and ITMCD. The detection rate of ITMCD is generally higher than that of KS, especially with regard to detection of standard deviation changes. Also, ITMCD exhibits more stable false alarm rate, which is an important advantage in IoT.

Finally, we simulated the non-stationary non-periodic change detection case. The task was to detect a change in a one dimensional Gaussian distributed sequence with varying mean and standard deviation parameters throughout time (randomly picking these parameters every 200-800 sample segments). The parameters used in the training signal varied within a given range, whereas the test signal included data with a higher mean than the highest previously observed, and the increase was by twice the highest standard-deviation used for simulating a segment so far (Figure 2). We compared the performance of One-Class SVM (OC-SVM) algorithm, two-sample Kolmogorov Smirnov test, and the -rule, which assumes a normal distribution and sets a cutoff on the number of global standard deviations from the global mean. We set a predefined false alarm rate . The results are summarized in Table 2. OC-SVM provided similar results for detecting changes in variance.

The results emphasize how difficult it is to detect changes in non-stationary signals with standard methods. KS has a high detection rate, but an unacceptable false alarm rate, whereas the -rule has a reasonable false alarm rate, but a very low detection rate. The One-Class SVM shows good results in both parameters.

Change | α |
KS F | KS E | ITMCD F | ITMCD E |
---|---|---|---|---|---|

μ + = 1 |
0.1% | 0.04% | 13.1% | 0.09% | 13.9% |

μ + = 1 |
1% | 0.5% | 32.6% | 0.6% | 34.3% |

μ + = 2 |
0.1% | 0.04% | 87.1% | 0.06% | 92.1% |

μ + = 2 |
1% | 0.6% | 94.0% | 0.8% | 96.0% |

σ + = 1 |
1% | 0.7% | 1.5% | 1.2% | 6.3% |

σ + = 2 |
1% | 0.4% | 2.5% | 1.1% | 13.2% |

σ + = 3 |
1% | 0.6% | 3.6% | 0.9% | 26.4% |

**Table 1:** One dimensional change detection on simulated Gaussian distributed sequences. The results are presented for the two-sample KS and ITMCD algorithm for detection rate (E) and false alarm rate (F). Change represents either mean or variance shift, and *α* is a predefined false alarm rate.

Test | Tuned FA | Actual FA | Detection |
---|---|---|---|

One-Class SVM | 0.1% | 0.3% | 99% |

KS Test | 0.1% | 18% | 93% |

σ rule |
0.1% | 1% | 47% |

**Table 2:** One dimensional change detection on a simulated non-stationary non-periodic signal. The results show the actual false alarm rate and the detection rate for One-Class SVM, two-sample KS, and *σ* rule.

**Figure 2:** Simulated non-stationary non-periodic signal with a red mark indicating a detected change

## 4. Summary and Conclusions

In this paper we described an analytical framework for IoT data. It provides meaningful insights regarding anomalous behavior and changes over time from parallel online sensory data streams.

We described the various analytical layers of our engine, and presented experiments showing state-of-the-art performance, both in terms of high detection rate and stable false alarm rates. We conclude that identifying anomalies and changes would be possible in this setting, despite the lack of prior knowledge on the data. This engine is currently in integration process for Intel® internal use cases.

## About the Authors

Gilad Wallach is a Data Scientist in the Advanced Analytics department at Intel®. He has worked on various projects, including an IoT analytics engine, a Machine Learning oriented benchmark, and improving Intel’s manufacturing process through analytics. Prior to joining Intel, Gilad completed his M.Sc. in electrical engineering and his B.Sc. in electrical engineering and physics in Tel-Aviv University.

Dr. Lev Faivishevsky is a Lead Data Scientist at Intel® with over 15 years of algorithmic development in the semiconductor and financial industries. He is currently in the Advanced Analytics department, working on complex machine learning algorithms across various domains in Intel, including Internet of Things, Fab manufacturing, Test optimization, Parkinson Disease Research and Accelerating Machine Learning algorithms for Intel Architecture. Lev holds PhD in Electrical Engineering (focusing on Machine Learning) from Bar Ilan University. Lev authored 5 patents in computational lithography, metrology, malware detection, Internet of Things and multicore optimization. He published over 20 papers in premier scientific and technological forums including NIPS, ICML and ICASSP.

Dr. Amitai Armon is the Chief Data-Scientist at Intel’s Advanced Analytics department, which provides solutions for diverse company challenges using Machine Learning and Big Data techniques. Prior to joining Intel®, Amitai was the co-founder and director of research at TaKaDu, a provider of water-network analytics software that received several international awards. Amitai received his computer-science Ph.D. in 2008 from the Tel-Aviv University, where he also previously completed his B.Sc. studies (at the age of 18). Following his Ph.D. he visited the Los-Alamos National Lab. Amitai has about 20 years of experience in performing and leading data science and algorithmic work.

## References

Chen, Y.-K. (2012). Challenges and Opportunities of Internet of Things. *17th Asia and South Pacific Design Automation Conference*, 383–388. http://doi.org/10.1109/ASPDAC.2012.6164978

Chui, M., L{"o}ffler, M., & Roberts, R. (2010). *The Internet of Things*. *McKinsey Quarterly* (Vol. 2).

Dasu, T., Krishnan, S., Venkatasubramanian, S., & Yi, K. (2006). An information-theoretic approach to detecting changes in multi-dimensional data streams. *Proc. Symp. on the Interface of Statistics, Computing Science, and Applications*.

De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. *The Journal of the Acoustical Society of America*, *111*, 1917–1930. http://doi.org/10.1121/1.1458024͔

Desobry, F., Davy, M., & Doncarli, C. (2005). An online kernel change detection algorithm. *IEEE Transactions on Signal Processing*, *53*(8), 2961–2974. http://doi.org/10.1109/TSP.2005.851098

Dunn, O. J. (1961). Multiple Comparisons Among Means. *Journal of the American Statistical Association*, *56*(293), 52–64.

Faivishevsky, L. (2016). Information Theoretic Multivariate Change Detection for multisensory information processing in Internet of Things. In *ICASSP*.

Faivishevsky L. and Armon A. (2014). Multisensory change detection for Internet of Things domain. US Patent application 14/579,083. Publication number 20160095013.

Glazer, A., Lindenbaum, M., & Markovitch, S. (2012). Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data. In *Advances in Neural Information Processing Systems 25* (pp. 728–736). Curran Associates, Inc.

Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2007). A Kernel Method for the Two-Sample-Problem. In *Advances in Neural Information Processing Systems 19* (pp. 513–520). MIT Press.

Kolmogorov, A. (1933). Sulla determinazione empirica di una legge di distribuzione. *IG. Ist. Ital. Attuari*, *4*(2), 83–91.

Middleton, P., Kjeldsen, P., & Tully, J. (2013). Forecast: the Internet of Things, Worldwide. *Gartner*.

Nason, G. (2013). A test for second-order stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series. *Journal of the Royal Statistical Society. Series B: Statistical Methodology*, *75*(5), 879–904. http://doi.org/10.1111/rssb.12015

Priestley, M. B., & Rao, T. S. (1969). A Test for Non-Stationarity of Time-Series. *Source Journal of the Royal Statistical Society. Series B (Methodological) Journal of the Royal Statistical Society. Series B (Methodological*, *3191112*(1), 140–149.

Scholkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., & Platt, J. (1999). Support Vector Method for Novelty Detection. *NIPS*, *12*, 582–588.

Vaidya, P. (1989). An O(n log n) algorithm for the all-nearest-neighbors Problem. *Discrete and Computational Geometry*, *4*(1), 101–115. http://doi.org/10.1007/BF02187718

Wang, Q., Kulkarni, S. R., & Verdu, S. (2006). A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors. *IEEE Int. Symp. Information Theory, Seattle, WA*.

^{1}

#### Product and Performance Information

^{1}

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.