The browser version you are using is not recommended for this site.Please consider upgrading to the latest version of your browser by clicking one of the following links.
We are sorry, This PDF is available in download format only
Optimizing Hadoop* DeploymentsExecutive SummaryThis paper provides guidance, based on extensive lab testing conducted with Hadoop* at Intel, to organizations as they make key choices in the planning stages of Hadoop deployments. It begins with best practices for establishing server hardware specifications, helping architects choose optimal combinations of components. Next, it discusses the server software environment, including choosing the OS and version of Hadoop. Finally, it introduces some configuration and tuning advice that can help improve results in Hadoop environments.OverviewHaving moved beyond its origins in search and Web indexing, Hadoop is becoming increasingly attractive as a framework for large-scale, data-intensive applications. Because Hadoop deployments can have very large infrastructure requirements, hardware and software choices made at design time can have a significant impact on performance and TCO.Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world impacts.This paper discusses some of those optimizations, which fall into three general categories:• Server hardware.This set of recommendations focuses on choosing the appropriate hardware components for an optimal balance between performance and both initial and recurring costs.• System software.Read the full Optimizing Hadoop* Deployments White Paper.
Introducing an automation tool for rapidly preparing data for analysis so scientists can speed mining.
How businesses can use its versatility and scalability to mine answers through object relationships.
Diane Bryant, Data Center Group, discusses real time analytics for business in the launch keynote.
Shows how Hadoop* clusters analyze big data more effectively over Intel® 10Gb Ethernet.
The Intel® Distribution for Apache Hadoop* Software
Linda Feldt highlights big data research—video