Cargando…

Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification

The Eigenstrat method, based on principal components analysis (PCA), is commonly used both to quantify population relationships in population genetics and to correct for population stratification in genome-wide association studies. However, it can be difficult to make appropriate inference about pop...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Jianzhong, Amos, Christopher I.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2941459/
https://www.ncbi.nlm.nih.gov/pubmed/20862251
http://dx.doi.org/10.1371/journal.pone.0012510
_version_ 1782186901009072128
author Ma, Jianzhong
Amos, Christopher I.
author_facet Ma, Jianzhong
Amos, Christopher I.
author_sort Ma, Jianzhong
collection PubMed
description The Eigenstrat method, based on principal components analysis (PCA), is commonly used both to quantify population relationships in population genetics and to correct for population stratification in genome-wide association studies. However, it can be difficult to make appropriate inference about population relationships from the principal component (PC) scatter plot. Here, to better understand the working mechanism of the Eigenstrat method, we consider its theoretical or “population” formulation. The eigen-equation for samples from an arbitrary number ([Image: see text]) of populations is reduced to that of a matrix of dimension [Image: see text], the elements of which are determined by the variance-covariance matrix for the random vector of the [Image: see text] allele frequencies. Solving the reduced eigen-equation is numerically trivial and yields eigenvectors that are the axes of variation required for differentiating the populations. Using the reduced eigen-equation, we investigate the within-population fluctuations around the axes of variation on the PC scatter plot for simulated datasets. Specifically, we show that there exists an asymptotically stable pattern of the PC plot for large sample size. Our results provide theoretical guidance for interpreting the pattern of PC plot in terms of population relationships. For applications in genetic association tests, we demonstrate that, as a method of correcting for population stratification, regressing out the theoretical PCs corresponding to the axes of variation is equivalent to simply removing the population mean of allele counts and works as well as or better than the Eigenstrat method.
format Text
id pubmed-2941459
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29414592010-09-22 Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification Ma, Jianzhong Amos, Christopher I. PLoS One Research Article The Eigenstrat method, based on principal components analysis (PCA), is commonly used both to quantify population relationships in population genetics and to correct for population stratification in genome-wide association studies. However, it can be difficult to make appropriate inference about population relationships from the principal component (PC) scatter plot. Here, to better understand the working mechanism of the Eigenstrat method, we consider its theoretical or “population” formulation. The eigen-equation for samples from an arbitrary number ([Image: see text]) of populations is reduced to that of a matrix of dimension [Image: see text], the elements of which are determined by the variance-covariance matrix for the random vector of the [Image: see text] allele frequencies. Solving the reduced eigen-equation is numerically trivial and yields eigenvectors that are the axes of variation required for differentiating the populations. Using the reduced eigen-equation, we investigate the within-population fluctuations around the axes of variation on the PC scatter plot for simulated datasets. Specifically, we show that there exists an asymptotically stable pattern of the PC plot for large sample size. Our results provide theoretical guidance for interpreting the pattern of PC plot in terms of population relationships. For applications in genetic association tests, we demonstrate that, as a method of correcting for population stratification, regressing out the theoretical PCs corresponding to the axes of variation is equivalent to simply removing the population mean of allele counts and works as well as or better than the Eigenstrat method. Public Library of Science 2010-09-17 /pmc/articles/PMC2941459/ /pubmed/20862251 http://dx.doi.org/10.1371/journal.pone.0012510 Text en Ma, Amos. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ma, Jianzhong
Amos, Christopher I.
Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title_full Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title_fullStr Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title_full_unstemmed Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title_short Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
title_sort theoretical formulation of principal components analysis to detect and correct for population stratification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2941459/
https://www.ncbi.nlm.nih.gov/pubmed/20862251
http://dx.doi.org/10.1371/journal.pone.0012510
work_keys_str_mv AT majianzhong theoreticalformulationofprincipalcomponentsanalysistodetectandcorrectforpopulationstratification
AT amoschristopheri theoreticalformulationofprincipalcomponentsanalysistodetectandcorrectforpopulationstratification