Cargando…

Archetypal Analysis for population genetics

The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Gimbernat-Mayol, Julia, Dominguez Mantes, Albert, Bustamante, Carlos D., Mas Montserrat, Daniel, Ioannidis, Alexander G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9451066/
https://www.ncbi.nlm.nih.gov/pubmed/36007005
http://dx.doi.org/10.1371/journal.pcbi.1010301
_version_ 1784784657583177728
author Gimbernat-Mayol, Julia
Dominguez Mantes, Albert
Bustamante, Carlos D.
Mas Montserrat, Daniel
Ioannidis, Alexander G.
author_facet Gimbernat-Mayol, Julia
Dominguez Mantes, Albert
Bustamante, Carlos D.
Mas Montserrat, Daniel
Ioannidis, Alexander G.
author_sort Gimbernat-Mayol, Julia
collection PubMed
description The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.
format Online
Article
Text
id pubmed-9451066
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94510662022-09-08 Archetypal Analysis for population genetics Gimbernat-Mayol, Julia Dominguez Mantes, Albert Bustamante, Carlos D. Mas Montserrat, Daniel Ioannidis, Alexander G. PLoS Comput Biol Research Article The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids. Public Library of Science 2022-08-25 /pmc/articles/PMC9451066/ /pubmed/36007005 http://dx.doi.org/10.1371/journal.pcbi.1010301 Text en © 2022 Gimbernat-Mayol et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gimbernat-Mayol, Julia
Dominguez Mantes, Albert
Bustamante, Carlos D.
Mas Montserrat, Daniel
Ioannidis, Alexander G.
Archetypal Analysis for population genetics
title Archetypal Analysis for population genetics
title_full Archetypal Analysis for population genetics
title_fullStr Archetypal Analysis for population genetics
title_full_unstemmed Archetypal Analysis for population genetics
title_short Archetypal Analysis for population genetics
title_sort archetypal analysis for population genetics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9451066/
https://www.ncbi.nlm.nih.gov/pubmed/36007005
http://dx.doi.org/10.1371/journal.pcbi.1010301
work_keys_str_mv AT gimbernatmayoljulia archetypalanalysisforpopulationgenetics
AT dominguezmantesalbert archetypalanalysisforpopulationgenetics
AT bustamantecarlosd archetypalanalysisforpopulationgenetics
AT masmontserratdaniel archetypalanalysisforpopulationgenetics
AT ioannidisalexanderg archetypalanalysisforpopulationgenetics