Apache Hadoop* Framework Spotlight: Apache Hive* PDF

Expert Carl Steinbach describes how Apache Hive queries work within the Apache Hadoop framework

للأسف فإن ملف PDF هذا متوافر فقط بهيئة قابلة للتنزيل

Query and Manage Large Data Sets with Apache Hive*

Many people who are familiar with the Apache Hadoop* framework think of Hive* as a SQL engine, a way to automatically compile a SQL query into a set of MapReduce jobs and then run them on a Hadoop* cluster. While this is accurate, the things that make Hive* really novel are its facilities for data and metadata management.

MapReduce is a very flexible programming paradigm, but most users find it too low-level for everyday data analysis tasks. Almost from the day Hadoop* was introduced, people began looking for ways to express their data analysis tasks using higher-level abstractions built on top of MapReduce. The engineers at Facebook*, who built the first version of Hive*, decided to use SQL as their higher-level language due to its widespread adoption and also because a majority of their analysts already knew how to use it.

Anyone with even a little bit of prior SQL experience will be able to come up to speed quickly with Hive*. Hive* supports large portions of the SQL-92* standard, as well as several extensions that are designed to make it easier to interact with the underlying Hadoop* platform.

Read the full Query and Manage Large Data Sets with Apache Hive* Spotlight.