Optimized Genomics Code
Healthcare today finds itself in the midst of a dramatic transformation. Big changes are being driven in part by the converging trends: explosion in genomic data and plummeting gene sequencing cost.
To make the most of this unprecedented opportunity requires harnessing large, high performance computing clusters equipped with the latest technologies. The rewards could be profound, driving discovery and unlocking insights that are able to usher in a new era of personalized medicine.
Processing a single human sample currently takes hundreds of hours of compute time. Encoding that raw data takes several hundred gigabytes; data for an entire population requires exabytes of storage. That requires newer, scalable, high-throughput distributed storage systems and new, more efficient databases.
Intel works with industry leader experts, commercial and open-source authors of key genomic codes. They optimize top industry codes to ensure that genome processing runs as fast as possible on Intel®-based systems and clusters. We then help facilitate the release of these changes through the main distributions to maximize industry impact and ensure everyone benefits from the optimization efforts.
Our process has significantly improved the speed of key genomic programs. And, we continue to develop new hardware and system solutions to get genome sequencing and processing down to minutes instead of days.
See how Intel® technologies provide complete analytics solutions for high performance computing in personalized medicine.
A popular software package for mapping low-divergent sequences against a large-reference genome, such as the human genome.
An open-source implementation of the HMMER* protein sequence analysis suite.
An algorithm for comparing primary biological sequence information.
A software package developed at the Broad Institute to analyze next-generation sequencing data.
QIAGEN Bioinformatics* solutions deliver faster time to insight by combining powerful analytics that are able to interpret complex biological processes.
Halvade* is a MapReduce implementation of the best-practice DNA sequencing pipeline as recommended by Broad Institute.
ABySS* is an open-source de novo genome assembler for short paired-end reads.
DIDA* performs large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes.
elPrep* is a high-performance tool for preparing SAM/BAM/CRAM files for variant calling in genomic sequencing pipelines.
See how multi-core Intel® Xeon® processors, Intel® Xeon Phi™ coprocessors, and other Intel® technologies provide complete analytics solutions for high-performance computing in personalized medicine at lower costs than traditional solutions.
The Intel Genomics Cluster solution is designed to help organizations meet the demand for fast, high-volume genome analysis for next-generation sequences.
TGAC's SGI UV 2000* cluster with Intel® Xeon® processors provides the high-performance computing to help researchers sequence crop genomes.
Examines features of Exacloud, an Intel-provided and supported high performance computing (HPC) cluster, optimized for life science workloads at the Oregon Health & Science University (OHSU) state-of-the-art data center—Data Center West.
One of the initial goals of the ExaScience Life Lab is to examine how supercomputers can accelerate the processing of whole-genome sequences. Currently, the processing time of a single whole genome is measured in days rather than hours.
The runtimes of the HaplotypeCaller* tool in GATK v3.2 for the input NA128781 from the 1000 Genomes Project are listed in Table 1. This input has an average coverage for the GATK dataset of 47x, whole genome.
Next generation sequencing (NGS) technologies generate vast amounts of variant data, the analysis of which poses a big computational challenge. Numerous currently undertaken research efforts, such as population genetics studies or association studies, require computing various statistics and performing statistical tests on the genome sequencing data.
Rapid advances in genome sequencing are placing heavier demands on the high performance computing (HPC) clusters that are typically used for analysis, yet upgrading a cluster can be a complex and costly undertaking. Based on tests performed by the Scripps Translational Science Institute and Intel, a single server based on Intel® Xeon® processor E7 v2 family can provide up to 34 percent faster time to results for genome analysis than a typical, 35-node high-performance research cluster—and up to 90 percent faster time to results for full genome read mapping.
When it comes to generating increasingly larger data sets and stretching the limits of high performance computing (HPC), the field of genomics and next-generation sequencing (NGS) is in the forefront. The major impetus for this data explosion began in 1990 when the U.S. kicked off the Human Genome Project, an ambitious project designed to sequence the three billion base pairs that constitute the complete set of DNA in the human body.
See how Intel is working with Intermountain's Transformation Lab to understand and develop tools that integrate genetic data and accelerate its use through a new patient data-centric model that delivers meaningful and easy-to-understand information and accelerates its use at the point of care.
The science of genomics, which studies the sequencing and analysis of DNA structures, has revolutionized healthcare. Advances in genomics are enabling personalized treatments and potential cures by testing genetic variants and comparing these to what is known about various therapies and drugs.
Life sciences benefit from big data analytics enabling personalized medicine for high-quality care and better outcomes. Some management challenges the life sciences face are data security and proper analytics for using data in and efficient and expeditious manner.