Cargando…

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like E...

Descripción completa

Detalles Bibliográficos
Autor principal: Elhaik, Eran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9424212/
https://www.ncbi.nlm.nih.gov/pubmed/36038559
http://dx.doi.org/10.1038/s41598-022-14395-4
_version_ 1784778191285518336
author Elhaik, Eran
author_facet Elhaik, Eran
author_sort Elhaik, Eran
collection PubMed
description Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
format Online
Article
Text
id pubmed-9424212
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-94242122022-08-31 Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated Elhaik, Eran Sci Rep Article Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed. Nature Publishing Group UK 2022-08-29 /pmc/articles/PMC9424212/ /pubmed/36038559 http://dx.doi.org/10.1038/s41598-022-14395-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Elhaik, Eran
Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title_full Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title_fullStr Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title_full_unstemmed Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title_short Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
title_sort principal component analyses (pca)-based findings in population genetic studies are highly biased and must be reevaluated
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9424212/
https://www.ncbi.nlm.nih.gov/pubmed/36038559
http://dx.doi.org/10.1038/s41598-022-14395-4
work_keys_str_mv AT elhaikeran principalcomponentanalysespcabasedfindingsinpopulationgeneticstudiesarehighlybiasedandmustbereevaluated