Distributed Indexing Dispatched Alignment* (DIDA*)

DIDA* performs large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes.

Performance increase by up to 77 percent 1

Distributed Indexing Dispatched Alignment* (DIDA*) is a novel distributed and parallel indexing and alignment framework that consists of five major steps to perform the indexing and alignment task: distribute, index, dispatch, align, and merge. The indexing and dispatch steps are performed in parallel. It works by first partitioning the targets into smaller parts using a heuristic balanced cut. Next, DIDA creates an index for each partition. The reads are then “flowed” through a Bloom filter to dispatch the alignment task to the node(s). Finally, the reads are aligned on all partitions in parallel and the partial results are combined together to create the final output.

DIDA is written in C++ and parallelized using OpenMP for multithreaded computing on a single computing node. For distributed computing, DIDA employs a message passing interface (MPI) for inter-process communications. As input, it gets the set of target sequences and the set of queries in FASTA or FASTQ formats, and the default output is SAM format.

Performance Results

The performance of DIDA was measured and evaluated when coupled with popular alignment methods Burrows-Wheeler Aligner* (BWA*), Bowtie2, Novoalign, and ABySS-map on C. elegans, human draft genome, human reference genome, and P. glauca genome. Compared to their baseline performance, when run through the DIDA framework with 12 nodes, BWA, Bowtie2, Novoalign, and ABySS-map use less memory (91 percent, 90 percent, 87 percent, and 91 percent, respectively) and execute faster (55 percent, 74 percent, 77 percent, and 67 percent, respectively) for a draft human genome assembly1.

Download the code ›

Reproduce these results with this optimization recipe ›

Related Codes

Assembly By Short Sequences * (ABySS*) ›

Publications

Hamid Mohamadi, Benjamin P. Vandervalk, Anthony Raymond, Shaun D. Jackman, Justin Chu, Clay P. Breshears, and Inanc Birol. "DIDA: Distributed Indexing Dispatched Alignment." PLoS ONE 10, no. 4 (2015). doi:10.1371/journal.pone.0126409.

Configuration Table

System Overview

 

Nodes

Twelve HPC nodes interconnected by 40Gbps Infiniband

Processor

Each node has two Intel® Xeon® X5650 processors (2.67 GHz)

RAM

Each node has 48GB RAM

Operating System

CentOS 5.4
Intel® Cluster Studio 2013
DIDA ver. 1.0.1, ABySS-map v1.5.2
BWA v0.7.10, Bowtie2 v2.1.0
Novoalign v3.01.02

Product and Performance Information

1

Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit https://www.intel.com/benchmarks.

Intel is a sponsor and member of the BenchmarkXPRT Development Community, and was the major developer of the XPRT family of benchmarks. Principled Technologies is the publisher of the XPRT family of benchmarks.