Cargando…
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools
This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-pr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115855/ https://www.ncbi.nlm.nih.gov/pubmed/27861637 http://dx.doi.org/10.1371/journal.pone.0167100 |
_version_ | 1782468584843247616 |
---|---|
author | Ogasawara, Takeshi Cheng, Yinhe Tzeng, Tzy-Hwa Kathy |
author_facet | Ogasawara, Takeshi Cheng, Yinhe Tzeng, Tzy-Hwa Kathy |
author_sort | Ogasawara, Takeshi |
collection | PubMed |
description | This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. |
format | Online Article Text |
id | pubmed-5115855 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-51158552016-12-08 Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools Ogasawara, Takeshi Cheng, Yinhe Tzeng, Tzy-Hwa Kathy PLoS One Research Article This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. Public Library of Science 2016-11-18 /pmc/articles/PMC5115855/ /pubmed/27861637 http://dx.doi.org/10.1371/journal.pone.0167100 Text en © 2016 Ogasawara et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Ogasawara, Takeshi Cheng, Yinhe Tzeng, Tzy-Hwa Kathy Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title | Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title_full | Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title_fullStr | Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title_full_unstemmed | Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title_short | Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools |
title_sort | sam2bam: high-performance framework for ngs data preprocessing tools |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115855/ https://www.ncbi.nlm.nih.gov/pubmed/27861637 http://dx.doi.org/10.1371/journal.pone.0167100 |
work_keys_str_mv | AT ogasawaratakeshi sam2bamhighperformanceframeworkforngsdatapreprocessingtools AT chengyinhe sam2bamhighperformanceframeworkforngsdatapreprocessingtools AT tzengtzyhwakathy sam2bamhighperformanceframeworkforngsdatapreprocessingtools |