Cargando…

StrAuto: automation and parallelization of STRUCTURE analysis

BACKGROUND: Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chhatre, Vikram E., Emerson, Kevin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366143/
https://www.ncbi.nlm.nih.gov/pubmed/28340552
http://dx.doi.org/10.1186/s12859-017-1593-0
_version_ 1782517537487978496
author Chhatre, Vikram E.
Emerson, Kevin J.
author_facet Chhatre, Vikram E.
Emerson, Kevin J.
author_sort Chhatre, Vikram E.
collection PubMed
description BACKGROUND: Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. RESULTS: We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. CONCLUSION: StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation – a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org.
format Online
Article
Text
id pubmed-5366143
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53661432017-03-28 StrAuto: automation and parallelization of STRUCTURE analysis Chhatre, Vikram E. Emerson, Kevin J. BMC Bioinformatics Software BACKGROUND: Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. RESULTS: We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. CONCLUSION: StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation – a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org. BioMed Central 2017-03-24 /pmc/articles/PMC5366143/ /pubmed/28340552 http://dx.doi.org/10.1186/s12859-017-1593-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Chhatre, Vikram E.
Emerson, Kevin J.
StrAuto: automation and parallelization of STRUCTURE analysis
title StrAuto: automation and parallelization of STRUCTURE analysis
title_full StrAuto: automation and parallelization of STRUCTURE analysis
title_fullStr StrAuto: automation and parallelization of STRUCTURE analysis
title_full_unstemmed StrAuto: automation and parallelization of STRUCTURE analysis
title_short StrAuto: automation and parallelization of STRUCTURE analysis
title_sort strauto: automation and parallelization of structure analysis
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366143/
https://www.ncbi.nlm.nih.gov/pubmed/28340552
http://dx.doi.org/10.1186/s12859-017-1593-0
work_keys_str_mv AT chhatrevikrame strautoautomationandparallelizationofstructureanalysis
AT emersonkevinj strautoautomationandparallelizationofstructureanalysis