Cargando…

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization....

Descripción completa

Detalles Bibliográficos
Autores principales: Engelhardt, Barbara E., Stephens, Matthew
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940725/
https://www.ncbi.nlm.nih.gov/pubmed/20862358
http://dx.doi.org/10.1371/journal.pgen.1001117
_version_ 1782186827288936448
author Engelhardt, Barbara E.
Stephens, Matthew
author_facet Engelhardt, Barbara E.
Stephens, Matthew
author_sort Engelhardt, Barbara E.
collection PubMed
description We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models.
format Text
id pubmed-2940725
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29407252010-09-22 Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Engelhardt, Barbara E. Stephens, Matthew PLoS Genet Research Article We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models. Public Library of Science 2010-09-16 /pmc/articles/PMC2940725/ /pubmed/20862358 http://dx.doi.org/10.1371/journal.pgen.1001117 Text en Engelhardt, Stephens. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Engelhardt, Barbara E.
Stephens, Matthew
Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title_full Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title_fullStr Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title_full_unstemmed Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title_short Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
title_sort analysis of population structure: a unifying framework and novel methods based on sparse factor analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940725/
https://www.ncbi.nlm.nih.gov/pubmed/20862358
http://dx.doi.org/10.1371/journal.pgen.1001117
work_keys_str_mv AT engelhardtbarbarae analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis
AT stephensmatthew analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis