Cargando…
Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization....
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940725/ https://www.ncbi.nlm.nih.gov/pubmed/20862358 http://dx.doi.org/10.1371/journal.pgen.1001117 |
_version_ | 1782186827288936448 |
---|---|
author | Engelhardt, Barbara E. Stephens, Matthew |
author_facet | Engelhardt, Barbara E. Stephens, Matthew |
author_sort | Engelhardt, Barbara E. |
collection | PubMed |
description | We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models. |
format | Text |
id | pubmed-2940725 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29407252010-09-22 Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Engelhardt, Barbara E. Stephens, Matthew PLoS Genet Research Article We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models. Public Library of Science 2010-09-16 /pmc/articles/PMC2940725/ /pubmed/20862358 http://dx.doi.org/10.1371/journal.pgen.1001117 Text en Engelhardt, Stephens. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Engelhardt, Barbara E. Stephens, Matthew Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title | Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title_full | Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title_fullStr | Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title_full_unstemmed | Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title_short | Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis |
title_sort | analysis of population structure: a unifying framework and novel methods based on sparse factor analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2940725/ https://www.ncbi.nlm.nih.gov/pubmed/20862358 http://dx.doi.org/10.1371/journal.pgen.1001117 |
work_keys_str_mv | AT engelhardtbarbarae analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis AT stephensmatthew analysisofpopulationstructureaunifyingframeworkandnovelmethodsbasedonsparsefactoranalysis |