Cargando…
A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis
Principal component analysis (PCA) and F-statistics sensu Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related. F-statistics have a simpl...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014194/ https://www.ncbi.nlm.nih.gov/pubmed/35430884 http://dx.doi.org/10.1098/rstb.2020.0413 |
_version_ | 1784688157185277952 |
---|---|
author | Peter, Benjamin M. |
author_facet | Peter, Benjamin M. |
author_sort | Peter, Benjamin M. |
collection | PubMed |
description | Principal component analysis (PCA) and F-statistics sensu Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related. F-statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an F(3)-statistic will lie inside a circle on a PCA plot. Furthermore, the F(4)-statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most F-statistics, and that PCA plots are effective at predicting F-statistics. Thus, while F-statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’. |
format | Online Article Text |
id | pubmed-9014194 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | The Royal Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-90141942022-04-21 A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis Peter, Benjamin M. Philos Trans R Soc Lond B Biol Sci Articles Principal component analysis (PCA) and F-statistics sensu Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related. F-statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an F(3)-statistic will lie inside a circle on a PCA plot. Furthermore, the F(4)-statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most F-statistics, and that PCA plots are effective at predicting F-statistics. Thus, while F-statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’. The Royal Society 2022-06-06 2022-04-18 /pmc/articles/PMC9014194/ /pubmed/35430884 http://dx.doi.org/10.1098/rstb.2020.0413 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Articles Peter, Benjamin M. A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title | A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title_full | A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title_fullStr | A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title_full_unstemmed | A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title_short | A geometric relationship of F(2), F(3) and F(4)-statistics with principal component analysis |
title_sort | geometric relationship of f(2), f(3) and f(4)-statistics with principal component analysis |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014194/ https://www.ncbi.nlm.nih.gov/pubmed/35430884 http://dx.doi.org/10.1098/rstb.2020.0413 |
work_keys_str_mv | AT peterbenjaminm ageometricrelationshipoff2f3andf4statisticswithprincipalcomponentanalysis AT peterbenjaminm geometricrelationshipoff2f3andf4statisticswithprincipalcomponentanalysis |