Cargando…

SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets

BACKGROUND: Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sarovich, Derek S, Price, Erin P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4169827/ https://www.ncbi.nlm.nih.gov/pubmed/25201145 http://dx.doi.org/10.1186/1756-0500-7-618

_version_	1782335769595084800
author	Sarovich, Derek S Price, Erin P
author_facet	Sarovich, Derek S Price, Erin P
author_sort	Sarovich, Derek S
collection	PubMed
description	BACKGROUND: Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. FINDINGS: We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. CONCLUSION: SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.
format	Online Article Text
id	pubmed-4169827
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-41698272014-09-22 SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets Sarovich, Derek S Price, Erin P BMC Res Notes Technical Note BACKGROUND: Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. FINDINGS: We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. CONCLUSION: SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/. BioMed Central 2014-09-08 /pmc/articles/PMC4169827/ /pubmed/25201145 http://dx.doi.org/10.1186/1756-0500-7-618 Text en © Sarovich and Price; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Note Sarovich, Derek S Price, Erin P SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title	SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title_full	SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title_fullStr	SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title_full_unstemmed	SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title_short	SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
title_sort	spandx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4169827/ https://www.ncbi.nlm.nih.gov/pubmed/25201145 http://dx.doi.org/10.1186/1756-0500-7-618
work_keys_str_mv	AT sarovichdereks spandxagenomicspipelineforcomparativeanalysisoflargehaploidwholegenomeresequencingdatasets AT priceerinp spandxagenomicspipelineforcomparativeanalysisoflargehaploidwholegenomeresequencingdatasets

SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets

Ejemplares similares