Cargando…

Probabilistic models of genetic variation in structured populations applied to global human studies

Motivation: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most promin...

Descripción completa

Detalles Bibliográficos
Autores principales: Hao, Wei, Song, Minsun, Storey, John D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795615/
https://www.ncbi.nlm.nih.gov/pubmed/26545820
http://dx.doi.org/10.1093/bioinformatics/btv641
_version_ 1782421631047565312
author Hao, Wei
Song, Minsun
Storey, John D.
author_facet Hao, Wei
Song, Minsun
Storey, John D.
author_sort Hao, Wei
collection PubMed
description Motivation: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation. Results: We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard–Stephens–Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new ‘logistic factor analysis’ framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions. Availability and Implementation: A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html. Contact: jstorey@princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4795615
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47956152016-03-21 Probabilistic models of genetic variation in structured populations applied to global human studies Hao, Wei Song, Minsun Storey, John D. Bioinformatics Original Papers Motivation: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation. Results: We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard–Stephens–Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new ‘logistic factor analysis’ framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions. Availability and Implementation: A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html. Contact: jstorey@princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-03-01 2015-11-06 /pmc/articles/PMC4795615/ /pubmed/26545820 http://dx.doi.org/10.1093/bioinformatics/btv641 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Hao, Wei
Song, Minsun
Storey, John D.
Probabilistic models of genetic variation in structured populations applied to global human studies
title Probabilistic models of genetic variation in structured populations applied to global human studies
title_full Probabilistic models of genetic variation in structured populations applied to global human studies
title_fullStr Probabilistic models of genetic variation in structured populations applied to global human studies
title_full_unstemmed Probabilistic models of genetic variation in structured populations applied to global human studies
title_short Probabilistic models of genetic variation in structured populations applied to global human studies
title_sort probabilistic models of genetic variation in structured populations applied to global human studies
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795615/
https://www.ncbi.nlm.nih.gov/pubmed/26545820
http://dx.doi.org/10.1093/bioinformatics/btv641
work_keys_str_mv AT haowei probabilisticmodelsofgeneticvariationinstructuredpopulationsappliedtoglobalhumanstudies
AT songminsun probabilisticmodelsofgeneticvariationinstructuredpopulationsappliedtoglobalhumanstudies
AT storeyjohnd probabilisticmodelsofgeneticvariationinstructuredpopulationsappliedtoglobalhumanstudies