Cargando…

Efficient digest of high-throughput sequencing data in a reproducible report

BACKGROUND: High-throughput sequencing (HTS) technologies are spearheading the accelerated development of biomedical research. Processing and summarizing the large amount of data generated by HTS presents a non-trivial challenge to bioinformatics. A commonly adopted standard is to store sequencing r...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhe, Leipzig, Jeremy, Sasson, Ariella, Yu, Angela M, Perin, Juan C, Xie, Hongbo M, Sarmady, Mahdi, Warren, Patrick V, White, Peter S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3846741/
https://www.ncbi.nlm.nih.gov/pubmed/24564231
http://dx.doi.org/10.1186/1471-2105-14-S11-S3
_version_ 1782293479355842560
author Zhang, Zhe
Leipzig, Jeremy
Sasson, Ariella
Yu, Angela M
Perin, Juan C
Xie, Hongbo M
Sarmady, Mahdi
Warren, Patrick V
White, Peter S
author_facet Zhang, Zhe
Leipzig, Jeremy
Sasson, Ariella
Yu, Angela M
Perin, Juan C
Xie, Hongbo M
Sarmady, Mahdi
Warren, Patrick V
White, Peter S
author_sort Zhang, Zhe
collection PubMed
description BACKGROUND: High-throughput sequencing (HTS) technologies are spearheading the accelerated development of biomedical research. Processing and summarizing the large amount of data generated by HTS presents a non-trivial challenge to bioinformatics. A commonly adopted standard is to store sequencing reads aligned to a reference genome in SAM (Sequence Alignment/Map) or BAM (Binary Alignment/Map) files. Quality control of SAM/BAM files is a critical checkpoint before downstream analysis. The goal of the current project is to facilitate and standardize this process. RESULTS: We developed bamchop, a robust program to efficiently summarize key statistical metrics of HTS data stored in BAM files, and to visually present the results in a formatted report. The report documents information about various aspects of HTS data, such as sequencing quality, mapping to a reference genome, sequencing coverage, and base frequency. Bamchop uses the R language and Bioconductor packages to calculate statistical matrices and the Sweave utility and associated LaTeX markup for documentation. Bamchop's efficiency and robustness were tested on BAM files generated by local sequencing facilities and the 1000 Genomes Project. Source code, instruction and example reports of bamchop are freely available from https://github.com/CBMi-BiG/bamchop. CONCLUSIONS: Bamchop enables biomedical researchers to quickly and rigorously evaluate HTS data by providing a convenient synopsis and user-friendly reports.
format Online
Article
Text
id pubmed-3846741
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38467412013-12-06 Efficient digest of high-throughput sequencing data in a reproducible report Zhang, Zhe Leipzig, Jeremy Sasson, Ariella Yu, Angela M Perin, Juan C Xie, Hongbo M Sarmady, Mahdi Warren, Patrick V White, Peter S BMC Bioinformatics Research BACKGROUND: High-throughput sequencing (HTS) technologies are spearheading the accelerated development of biomedical research. Processing and summarizing the large amount of data generated by HTS presents a non-trivial challenge to bioinformatics. A commonly adopted standard is to store sequencing reads aligned to a reference genome in SAM (Sequence Alignment/Map) or BAM (Binary Alignment/Map) files. Quality control of SAM/BAM files is a critical checkpoint before downstream analysis. The goal of the current project is to facilitate and standardize this process. RESULTS: We developed bamchop, a robust program to efficiently summarize key statistical metrics of HTS data stored in BAM files, and to visually present the results in a formatted report. The report documents information about various aspects of HTS data, such as sequencing quality, mapping to a reference genome, sequencing coverage, and base frequency. Bamchop uses the R language and Bioconductor packages to calculate statistical matrices and the Sweave utility and associated LaTeX markup for documentation. Bamchop's efficiency and robustness were tested on BAM files generated by local sequencing facilities and the 1000 Genomes Project. Source code, instruction and example reports of bamchop are freely available from https://github.com/CBMi-BiG/bamchop. CONCLUSIONS: Bamchop enables biomedical researchers to quickly and rigorously evaluate HTS data by providing a convenient synopsis and user-friendly reports. BioMed Central 2013-09-13 /pmc/articles/PMC3846741/ /pubmed/24564231 http://dx.doi.org/10.1186/1471-2105-14-S11-S3 Text en Copyright © 2013 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zhang, Zhe
Leipzig, Jeremy
Sasson, Ariella
Yu, Angela M
Perin, Juan C
Xie, Hongbo M
Sarmady, Mahdi
Warren, Patrick V
White, Peter S
Efficient digest of high-throughput sequencing data in a reproducible report
title Efficient digest of high-throughput sequencing data in a reproducible report
title_full Efficient digest of high-throughput sequencing data in a reproducible report
title_fullStr Efficient digest of high-throughput sequencing data in a reproducible report
title_full_unstemmed Efficient digest of high-throughput sequencing data in a reproducible report
title_short Efficient digest of high-throughput sequencing data in a reproducible report
title_sort efficient digest of high-throughput sequencing data in a reproducible report
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3846741/
https://www.ncbi.nlm.nih.gov/pubmed/24564231
http://dx.doi.org/10.1186/1471-2105-14-S11-S3
work_keys_str_mv AT zhangzhe efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT leipzigjeremy efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT sassonariella efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT yuangelam efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT perinjuanc efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT xiehongbom efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT sarmadymahdi efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT warrenpatrickv efficientdigestofhighthroughputsequencingdatainareproduciblereport
AT whitepeters efficientdigestofhighthroughputsequencingdatainareproduciblereport