Gene Resequencing with Myrna on Intel® Distribution of Apache Hadoop*
Genome resequencing allows us to understand how genetic differences affect health and cause diseases. This is an important step in detecting anomalies associated with many genetically inherited diseases like heart disorders, down syndrome, cystic fibrosis and chromosomal abnormalities. Next Generation Sequencing (NGS) technologies running on high performance computing (HPC) architectures have enabled the sequencing on DNA at groundbreaking speeds. However the storage, analysis and management of the massive DNA sequence datasets produced as a result of NGS research, is a new challenge. Hadoop* and MapReduce* technologies come into play here by allowing parallel read-mapping algorithms to scale effectively and resulting in shorter execution times and lower costs (from software execution and hardware). Among other areas Hadoop technologies may be useful are data storage, data management, statistical analysis and statistical association between various data sources. Organizations are now able to store large datasets in Hadoop Distributed File Systems (HDFS) and are able to use real-time analytics software to access data directly from HDFS bypassing any data migration headaches. Software packages like Myrna, developed by Ben Langmead, Kasper Hansen and Jeff Leek (Johns Hopkins University) is one such tool that allows the calculation of differential gene expressions in RNA-seq datasets on cloud (Amazon Elastic MapReduce) or Hadoop clusters.
Intel wants to provide businesses with an open enterprise Hadoop platform alternative for next generation analytics and life sciences, called the Intel® Distribution for Apache Hadoop software, which provides better manageability and performance – optimized for Intel® Xeon® processors.
In this paper, we demonstrate how to install and configure Myrna and its required components – Bowtie, R/Bioconductor and SRA toolkit within the Intel® Distribution for Apache Hadoop.
Read the full Gene Resequencing with Myrna on Intel® Distribution of Apache Hadoop* White Paper.