Apache Hadoop* is evolving as the best new approach to unstructured data analytics. Hadoop is an open source framework that uses a simple programming model to enable distributed processing of large data sets on clusters of computers.
Learn from some of the experts leading open-source community efforts for four key elements of an Apache Hadoop solution: the Apache Hadoop Distributed File System* (Apache HDFS*), Apache MapReduce*. Apache Pig*, and Apache HCatalog*.

What exactly is Apache HDFS*? How does it work? Konstantin Shvachko explains it all, including limitations, and where software development is headed.

The secrets of MapReduce, a powerful model for parallel processing large data sets are revealed by expert Deveraj Das.

Create your own Apache MapReduce applications. Alan Gates explains how the Apache Pig platform high-level data flow programming language and execution framework make it easy.

Is big data interoperability possible? Apache Hadoop open-source community expert Alan Gates explains how integration tool HCatalog enables data interoperability for the Hadoop* framework and external system users.

Apache Hive is more than a SQL engine for SQL queries. Apache Hadoop open-source community expert Carl Steinbach explains how the things that make Hive really novel are its facilities for data and metadata management.
Big data intelligence begins with Intel.
Learn more about big data analytics with Intel >