Cargando…

ASAP: an environment for automated preprocessing of sequencing data

BACKGROUND: Next-generation sequencing (NGS) has yielded an unprecedented amount of data for genetics research. It is a daunting task to process the data from raw sequence reads to variant calls and manually processing this data can significantly delay downstream analysis and increase the possibilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Torstenson, Eric S, Li, Bingshan, Li, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541347/
https://www.ncbi.nlm.nih.gov/pubmed/23289815
http://dx.doi.org/10.1186/1756-0500-6-5
_version_ 1782255346772869120
author Torstenson, Eric S
Li, Bingshan
Li, Chun
author_facet Torstenson, Eric S
Li, Bingshan
Li, Chun
author_sort Torstenson, Eric S
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) has yielded an unprecedented amount of data for genetics research. It is a daunting task to process the data from raw sequence reads to variant calls and manually processing this data can significantly delay downstream analysis and increase the possibility for human error. The research community has produced tools to properly prepare sequence data for analysis and established guidelines on how to apply those tools to achieve the best results, however, existing pipeline programs to automate the process through its entirety are either inaccessible to investigators, or web-based and require a certain amount of administrative expertise to set up. FINDINGS: Advanced Sequence Automated Pipeline (ASAP) was developed to provide a framework for automating the translation of sequencing data into annotated variant calls with the goal of minimizing user involvement without the need for dedicated hardware or administrative rights. ASAP works both on computer clusters and on standalone machines with minimal human involvement and maintains high data integrity, while allowing complete control over the configuration of its component programs. It offers an easy-to-use interface for submitting and tracking jobs as well as resuming failed jobs. It also provides tools for quality checking and for dividing jobs into pieces for maximum throughput. CONCLUSIONS: ASAP provides an environment for building an automated pipeline for NGS data preprocessing. This environment is flexible for use and future development. It is freely available at http://biostat.mc.vanderbilt.edu/ASAP.
format Online
Article
Text
id pubmed-3541347
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35413472013-01-11 ASAP: an environment for automated preprocessing of sequencing data Torstenson, Eric S Li, Bingshan Li, Chun BMC Res Notes Technical Note BACKGROUND: Next-generation sequencing (NGS) has yielded an unprecedented amount of data for genetics research. It is a daunting task to process the data from raw sequence reads to variant calls and manually processing this data can significantly delay downstream analysis and increase the possibility for human error. The research community has produced tools to properly prepare sequence data for analysis and established guidelines on how to apply those tools to achieve the best results, however, existing pipeline programs to automate the process through its entirety are either inaccessible to investigators, or web-based and require a certain amount of administrative expertise to set up. FINDINGS: Advanced Sequence Automated Pipeline (ASAP) was developed to provide a framework for automating the translation of sequencing data into annotated variant calls with the goal of minimizing user involvement without the need for dedicated hardware or administrative rights. ASAP works both on computer clusters and on standalone machines with minimal human involvement and maintains high data integrity, while allowing complete control over the configuration of its component programs. It offers an easy-to-use interface for submitting and tracking jobs as well as resuming failed jobs. It also provides tools for quality checking and for dividing jobs into pieces for maximum throughput. CONCLUSIONS: ASAP provides an environment for building an automated pipeline for NGS data preprocessing. This environment is flexible for use and future development. It is freely available at http://biostat.mc.vanderbilt.edu/ASAP. BioMed Central 2013-01-04 /pmc/articles/PMC3541347/ /pubmed/23289815 http://dx.doi.org/10.1186/1756-0500-6-5 Text en Copyright ©2013 Torstenson et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Torstenson, Eric S
Li, Bingshan
Li, Chun
ASAP: an environment for automated preprocessing of sequencing data
title ASAP: an environment for automated preprocessing of sequencing data
title_full ASAP: an environment for automated preprocessing of sequencing data
title_fullStr ASAP: an environment for automated preprocessing of sequencing data
title_full_unstemmed ASAP: an environment for automated preprocessing of sequencing data
title_short ASAP: an environment for automated preprocessing of sequencing data
title_sort asap: an environment for automated preprocessing of sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541347/
https://www.ncbi.nlm.nih.gov/pubmed/23289815
http://dx.doi.org/10.1186/1756-0500-6-5
work_keys_str_mv AT torstensonerics asapanenvironmentforautomatedpreprocessingofsequencingdata
AT libingshan asapanenvironmentforautomatedpreprocessingofsequencingdata
AT lichun asapanenvironmentforautomatedpreprocessingofsequencingdata