Real-Time Analytics on Apache Hadoop* Using Spark* and Shark*
Healthcare providers can get more valuable insights, manage costs, and provide better care options to patients by using data analytics and solutions. Big data technologies are enabling providers to store, analyze, and correlate various data sources to extrapolate knowledge. Beneﬁts include eﬃcient clinical decision support, lower administrative costs, faster fraud detection, and streamlined data exchange formats. It is projected that adoption of health data analytics will increase to almost 50 percent by 2016 from 10 percent in 2011, representing a 37.9 percent compound annual growth rate.
Apache Hadoop* and MapReduce* (MR*) technologies have been in the forefront of big data development and adoption. However, this architecture was always designed for data storage, data management, statistical analysis, and statistical association between various data sources using distributed computing and batch processing. Today’s environment demands all of the above, with the addition of real-time analytics. The result has been systems like Cloudera Impala* and Apache Spark*, which allow in-memory processing for fast response times, bypassing MapReduce operations.
To compare MapReduce with real-time processing, consider use cases like full text indexing, recommendation systems (for example, Netﬂix* movie recommendations), log analysis, computing Web indexes, and data mining. These are processes that can be allowed to run for extended periods of time.
Read the full Real-Time Analytics on Apache Hadoop* Using Spark* and Shark* White Paper.