Optimize Hadoop Cluster Performance with Various Storage Media

ID 659982
Updated 5/31/2016
Version Latest
Public

author-image

By

As more and more fast storage types (SSD, NVMe SSD, etc.) emerge, a methodology is necessary for better throughput and latency when using big data. However, these fast storage types are still expensive and are capacity limited. This study provides a guide for cluster setup with different storage media.

In general, this study considers the following questions:

  • What is the maximum performance a user can achieve by using fast storage?
  • Where are the bottlenecks?
  • How can the best balance be achieved between performance and cost?
  • How can the performance of a cluster with different storage combinations be predicted?

This study covers the HBase write performance on different storage media, leveraging the hierarchy storage management support in HDFS to store different categories of HBase data on different media.The following different types of storage were evaluated:

  • HDD: the most popular storage in current use.
  • SATA SSD: a faster storage which is gaining popularity.
  • RAMDISK: used to emulate the extremely high performance PCIe NVMe-based SSDs and upcoming faster storage (e.g. Intel 3D XPoint® based SSD). Due to hardware unavailability, RAMDISK was used to perform this emulation. The results hold for PCIe SSD and other fast storage types.

Download the study at the link below: