Cargando…

Limitations of principal components in quantitative genetic association models for human studies

Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal compon...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Yiqi, Ochoa, Alejandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234632/
https://www.ncbi.nlm.nih.gov/pubmed/37140344
http://dx.doi.org/10.7554/eLife.79238
_version_ 1785052537090473984
author Yao, Yiqi
Ochoa, Alejandro
author_facet Yao, Yiqi
Ochoa, Alejandro
author_sort Yao, Yiqi
collection PubMed
description Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.
format Online
Article
Text
id pubmed-10234632
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-102346322023-06-02 Limitations of principal components in quantitative genetic association models for human studies Yao, Yiqi Ochoa, Alejandro eLife Genetics and Genomics Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies. eLife Sciences Publications, Ltd 2023-05-04 /pmc/articles/PMC10234632/ /pubmed/37140344 http://dx.doi.org/10.7554/eLife.79238 Text en © 2023, Yao and Ochoa https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Genetics and Genomics
Yao, Yiqi
Ochoa, Alejandro
Limitations of principal components in quantitative genetic association models for human studies
title Limitations of principal components in quantitative genetic association models for human studies
title_full Limitations of principal components in quantitative genetic association models for human studies
title_fullStr Limitations of principal components in quantitative genetic association models for human studies
title_full_unstemmed Limitations of principal components in quantitative genetic association models for human studies
title_short Limitations of principal components in quantitative genetic association models for human studies
title_sort limitations of principal components in quantitative genetic association models for human studies
topic Genetics and Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234632/
https://www.ncbi.nlm.nih.gov/pubmed/37140344
http://dx.doi.org/10.7554/eLife.79238
work_keys_str_mv AT yaoyiqi limitationsofprincipalcomponentsinquantitativegeneticassociationmodelsforhumanstudies
AT ochoaalejandro limitationsofprincipalcomponentsinquantitativegeneticassociationmodelsforhumanstudies