The browser version you are using is not recommended for this site.
Please consider upgrading to the latest version of your browser by clicking one of the following links.

We are sorry, This PDF is available in download format only

Apache Hadoop Framework Spotlight: Apache Hive

Query and Manage Large Data Sets with Apache Hive*

Many people who are familiar with the Apache Hadoop* framework think of Hive* as a SQL engine, a way to automatically compile a SQL query into a set of MapReduce jobs and then run them on a Hadoop* cluster. While this is accurate, the things that make Hive* really novel are its facilities for data and metadata management.

MapReduce is a very flexible programming paradigm, but most users find it too low-level for everyday data analysis tasks. Almost from the day Hadoop* was introduced, people began looking for ways to express their data analysis tasks using higher-level abstractions built on top of MapReduce. The engineers at Facebook*, who built the first version of Hive*, decided to use SQL as their higher-level language due to its widespread adoption and also because a majority of their analysts already knew how to use it.

Anyone with even a little bit of prior SQL experience will be able to come up to speed quickly with Hive*. Hive* supports large portions of the SQL-92* standard, as well as several extensions that are designed to make it easier to interact with the underlying Hadoop* platform.

Read the full Query and Manage Large Data Sets with Apache Hive* Spotlight.

Related Videos