Cargando…

PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Cara, Marçal Comajoan, Montserrat, Daniel Mas, Ioannidis, Alexander G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592760/
https://www.ncbi.nlm.nih.gov/pubmed/37873492
http://dx.doi.org/10.1101/2023.10.10.561715
_version_ 1785124339108020224
author Cara, Marçal Comajoan
Montserrat, Daniel Mas
Ioannidis, Alexander G.
author_facet Cara, Marçal Comajoan
Montserrat, Daniel Mas
Ioannidis, Alexander G.
author_sort Cara, Marçal Comajoan
collection PubMed
description The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.
format Online
Article
Text
id pubmed-10592760
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105927602023-10-24 PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations Cara, Marçal Comajoan Montserrat, Daniel Mas Ioannidis, Alexander G. bioRxiv Article The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research. Cold Spring Harbor Laboratory 2023-10-10 /pmc/articles/PMC10592760/ /pubmed/37873492 http://dx.doi.org/10.1101/2023.10.10.561715 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Cara, Marçal Comajoan
Montserrat, Daniel Mas
Ioannidis, Alexander G.
PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title_full PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title_fullStr PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title_full_unstemmed PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title_short PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations
title_sort popgenadapt: semi-supervised domain adaptation for genotype-to-phenotype prediction in underrepresented populations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592760/
https://www.ncbi.nlm.nih.gov/pubmed/37873492
http://dx.doi.org/10.1101/2023.10.10.561715
work_keys_str_mv AT caramarcalcomajoan popgenadaptsemisuperviseddomainadaptationforgenotypetophenotypepredictioninunderrepresentedpopulations
AT montserratdanielmas popgenadaptsemisuperviseddomainadaptationforgenotypetophenotypepredictioninunderrepresentedpopulations
AT ioannidisalexanderg popgenadaptsemisuperviseddomainadaptationforgenotypetophenotypepredictioninunderrepresentedpopulations