The browser version you are using is not recommended for this site.Please consider upgrading to the latest version of your browser by clicking one of the following links.
We are sorry, This PDF is available in download format only
Konstantin Shvachko, Project Management Committee member for the Apache Hadoop* framework and founder of AltoScale, demystifies the Apache* Hadoop* Distributed File System (HDFS*) and talks about where software development is headed. HDFS is the primary distributed storage component used by applications under the Apache open-source project Hadoop. This overview by an expert from the Apache Hadoop open-source community explains the four design principles that drive development, how HDFS works, why it’s so well suited for handling large unstructured data sets, and where the software is headed. Part of the Intel® IT Center’s Hadoop Community Spotlight series. Also listen to the podcast of the interview.
Introducing an automation tool for rapidly preparing data for analysis so scientists can speed mining.
How businesses can use its versatility and scalability to mine answers through object relationships.
Apache HDFS* overview.
Apache Pig* overview.
Apache MapReduce overview.