Cargando…

KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis

Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features...

Descripción completa

Detalles Bibliográficos
Autores principales: Qin, Xinghu, Chiang, Charleston W K, Gaggiotti, Oscar E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294434/
https://www.ncbi.nlm.nih.gov/pubmed/35649387
http://dx.doi.org/10.1093/bib/bbac202
_version_ 1784749852470542336
author Qin, Xinghu
Chiang, Charleston W K
Gaggiotti, Oscar E
author_facet Qin, Xinghu
Chiang, Charleston W K
Gaggiotti, Oscar E
author_sort Qin, Xinghu
collection PubMed
description Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
format Online
Article
Text
id pubmed-9294434
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92944342022-07-20 KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis Qin, Xinghu Chiang, Charleston W K Gaggiotti, Oscar E Brief Bioinform Problem Solving Protocol Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility. Oxford University Press 2022-06-02 /pmc/articles/PMC9294434/ /pubmed/35649387 http://dx.doi.org/10.1093/bib/bbac202 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Qin, Xinghu
Chiang, Charleston W K
Gaggiotti, Oscar E
KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title_full KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title_fullStr KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title_full_unstemmed KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title_short KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis
title_sort klfdapc: a supervised machine learning approach for spatial genetic structure analysis
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294434/
https://www.ncbi.nlm.nih.gov/pubmed/35649387
http://dx.doi.org/10.1093/bib/bbac202
work_keys_str_mv AT qinxinghu klfdapcasupervisedmachinelearningapproachforspatialgeneticstructureanalysis
AT chiangcharlestonwk klfdapcasupervisedmachinelearningapproachforspatialgeneticstructureanalysis
AT gaggiottioscare klfdapcasupervisedmachinelearningapproachforspatialgeneticstructureanalysis