Cargando…

Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses

BACKGROUND: Whole-exome sequencing (WES) is a popular next-generation sequencing technology used by numerous laboratories with various levels of statistical and analytical expertise. Centralized databases, such as the Sequence Read Archive and the European Nucleotide Archive, allow data to be reanal...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Xiaoyi, Xu, Jianpeng, Starmer, Joshua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376134/
https://www.ncbi.nlm.nih.gov/pubmed/25889517
http://dx.doi.org/10.1186/s13104-015-1027-x
_version_ 1782363688153382912
author Gao, Xiaoyi
Xu, Jianpeng
Starmer, Joshua
author_facet Gao, Xiaoyi
Xu, Jianpeng
Starmer, Joshua
author_sort Gao, Xiaoyi
collection PubMed
description BACKGROUND: Whole-exome sequencing (WES) is a popular next-generation sequencing technology used by numerous laboratories with various levels of statistical and analytical expertise. Centralized databases, such as the Sequence Read Archive and the European Nucleotide Archive, allow data to be reanalyzed by independent labs to confirm results and derive additional insights. Access to new and shared data highlights the necessity for software that both lowers the statistical and analytical expertise required to generate results and promotes reproducible methodology among laboratories. FINDINGS: We have developed fastq2vcf, a pipeline that automates the genomic variant calling process using multiple callers. Fastq2vcf offers improved flexibility, efficiency, and reproducibility by seamlessly integrating several leading sequencing analysis tools. It outputs not only the annotated variant call set for each caller, but also the consensus variant call set shared by different callers. Furthermore, it can be customized and extended easily. CONCLUSIONS: Our software tool automatically generates executable command lines for a variety of tools required for analyzing WES data. It is also highly configurable and provides users with complete control of the processing procedure, making it easy to submit and track jobs in both single workstation and parallelized computing environments. By using this pipeline, WES analysis can be easily reproduced.
format Online
Article
Text
id pubmed-4376134
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43761342015-03-28 Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses Gao, Xiaoyi Xu, Jianpeng Starmer, Joshua BMC Res Notes Technical Note BACKGROUND: Whole-exome sequencing (WES) is a popular next-generation sequencing technology used by numerous laboratories with various levels of statistical and analytical expertise. Centralized databases, such as the Sequence Read Archive and the European Nucleotide Archive, allow data to be reanalyzed by independent labs to confirm results and derive additional insights. Access to new and shared data highlights the necessity for software that both lowers the statistical and analytical expertise required to generate results and promotes reproducible methodology among laboratories. FINDINGS: We have developed fastq2vcf, a pipeline that automates the genomic variant calling process using multiple callers. Fastq2vcf offers improved flexibility, efficiency, and reproducibility by seamlessly integrating several leading sequencing analysis tools. It outputs not only the annotated variant call set for each caller, but also the consensus variant call set shared by different callers. Furthermore, it can be customized and extended easily. CONCLUSIONS: Our software tool automatically generates executable command lines for a variety of tools required for analyzing WES data. It is also highly configurable and provides users with complete control of the processing procedure, making it easy to submit and track jobs in both single workstation and parallelized computing environments. By using this pipeline, WES analysis can be easily reproduced. BioMed Central 2015-03-08 /pmc/articles/PMC4376134/ /pubmed/25889517 http://dx.doi.org/10.1186/s13104-015-1027-x Text en © Gao et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Gao, Xiaoyi
Xu, Jianpeng
Starmer, Joshua
Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title_full Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title_fullStr Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title_full_unstemmed Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title_short Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
title_sort fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376134/
https://www.ncbi.nlm.nih.gov/pubmed/25889517
http://dx.doi.org/10.1186/s13104-015-1027-x
work_keys_str_mv AT gaoxiaoyi fastq2vcfaconciseandtransparentpipelineforwholeexomesequencingdataanalyses
AT xujianpeng fastq2vcfaconciseandtransparentpipelineforwholeexomesequencingdataanalyses
AT starmerjoshua fastq2vcfaconciseandtransparentpipelineforwholeexomesequencingdataanalyses