Cargando…

fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inferenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Raj, Anil, Stephens, Matthew, Pritchard, Jonathan K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063916/
https://www.ncbi.nlm.nih.gov/pubmed/24700103
http://dx.doi.org/10.1534/genetics.114.164350
_version_ 1782321873534582784
author Raj, Anil
Stephens, Matthew
Pritchard, Jonathan K.
author_facet Raj, Anil
Stephens, Matthew
Pritchard, Jonathan K.
author_sort Raj, Anil
collection PubMed
description Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.
format Online
Article
Text
id pubmed-4063916
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-40639162014-06-23 fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets Raj, Anil Stephens, Matthew Pritchard, Jonathan K. Genetics Investigations Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html. Genetics Society of America 2014-06 2014-04-02 /pmc/articles/PMC4063916/ /pubmed/24700103 http://dx.doi.org/10.1534/genetics.114.164350 Text en Copyright © 2014 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Raj, Anil
Stephens, Matthew
Pritchard, Jonathan K.
fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title_full fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title_fullStr fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title_full_unstemmed fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title_short fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
title_sort faststructure: variational inference of population structure in large snp data sets
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063916/
https://www.ncbi.nlm.nih.gov/pubmed/24700103
http://dx.doi.org/10.1534/genetics.114.164350
work_keys_str_mv AT rajanil faststructurevariationalinferenceofpopulationstructureinlargesnpdatasets
AT stephensmatthew faststructurevariationalinferenceofpopulationstructureinlargesnpdatasets
AT pritchardjonathank faststructurevariationalinferenceofpopulationstructureinlargesnpdatasets