Big Datasets from Small Experiments

ID 672335
Updated 7/6/2017
Version Latest
Public

author-image

By

Prof. Dunham's experiment and Poincaré plot

Modern experiments produce lots of data. Abundant data is not exclusive to the "big gun" institutions, such as observatories and particle colliders. It is also the norm in modest-size labs working on anything from genomics to microscopy. Even outside of science you don't need to go far to find lots of data. With the Internet-of-Things (IoT) at play, a modern smart home is a continuous source of big datasets! With data collection as easy as it is, how does one analyze the data efficiently?

The work of Prof. Jeffrey Dunham connects real-world phenomena to data collection to computing in a very pure experiment. He has built a tabletop-scale chaotic pendulum equipped with a high-precision rotary encoder. The pendulum produces hundreds of gigabytes of data per day. This data reveals the strange attractor of the pendulum, which is a fractal. This manifestation of "order in chaos" is not only a thing of beauty. It has roots in chaos theory, which also applies to climate studies, biology, cryptography, and technology. However, the amazing fractal structure of the data emerges only with proper post-processing. “Proper” means that the experimenter must scan a parameter space of the Savitzky-Golay filter. For each point, the computationally expensive filter must be applied to the entire dataset. For good science in this experiment, computational performance is paramount.

In his upcoming presentation in Modern Code Contributed talks ("MC² Series"), Prof. Dunham shares his experience with this computational challenge. He talks about the modern code practices that allowed him to shrink the data processing time from hours to fractions of a second. That was made possible through two factors. The first one is the usage of an Intel® Xeon Phi™ processor (formerly Knights Landing). The second one is a thoughtful approach to parallel programming. Prof. Dunham also talks about probing the peak performance of these processors, the roofline model, and the importance of vector arithmetics.

Tune into the webinar on July 11, 2017, or watch a recording after this date at https://colfaxresearch.com/mc2-004/