Cargando…

Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data

Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world....

Descripción completa

Detalles Bibliográficos
Autores principales: Mo, Ziyi, Siepel, Adam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002701/
https://www.ncbi.nlm.nih.gov/pubmed/36909514
http://dx.doi.org/10.1101/2023.03.01.529396
_version_ 1784904444577579008
author Mo, Ziyi
Siepel, Adam
author_facet Mo, Ziyi
Siepel, Adam
author_sort Mo, Ziyi
collection PubMed
description Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world. Here, we show that this “simulation mis-specification” problem can be framed as a “domain adaptation” problem, where a model learned from one data distribution is applied to a dataset drawn from a different distribution. By applying an established domain-adaptation technique based on a gradient reversal layer (GRL), originally introduced for image classification, we show that the effects of simulation mis-specification can be substantially mitigated. We focus our analysis on two state-of-the-art deep-learning population genetic methods—SIA, which infers positive selection from features of the ancestral recombination graph (ARG), and ReLERNN, which infers recombination rates from genotype matrices. In the case of SIA, the domain adaptive framework also compensates for ARG inference error. Using the domain-adaptive SIA (dadaSIA) model, we estimate improved selection coefficients at selected loci in the 1000 Genomes CEU population. We anticipate that domain adaptation will prove to be widely applicable in the growing use of supervised machine learning in population genetics.
format Online
Article
Text
id pubmed-10002701
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100027012023-03-11 Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data Mo, Ziyi Siepel, Adam bioRxiv Article Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world. Here, we show that this “simulation mis-specification” problem can be framed as a “domain adaptation” problem, where a model learned from one data distribution is applied to a dataset drawn from a different distribution. By applying an established domain-adaptation technique based on a gradient reversal layer (GRL), originally introduced for image classification, we show that the effects of simulation mis-specification can be substantially mitigated. We focus our analysis on two state-of-the-art deep-learning population genetic methods—SIA, which infers positive selection from features of the ancestral recombination graph (ARG), and ReLERNN, which infers recombination rates from genotype matrices. In the case of SIA, the domain adaptive framework also compensates for ARG inference error. Using the domain-adaptive SIA (dadaSIA) model, we estimate improved selection coefficients at selected loci in the 1000 Genomes CEU population. We anticipate that domain adaptation will prove to be widely applicable in the growing use of supervised machine learning in population genetics. Cold Spring Harbor Laboratory 2023-09-06 /pmc/articles/PMC10002701/ /pubmed/36909514 http://dx.doi.org/10.1101/2023.03.01.529396 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Mo, Ziyi
Siepel, Adam
Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_full Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_fullStr Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_full_unstemmed Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_short Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_sort domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002701/
https://www.ncbi.nlm.nih.gov/pubmed/36909514
http://dx.doi.org/10.1101/2023.03.01.529396
work_keys_str_mv AT moziyi domainadaptiveneuralnetworksimprovesupervisedmachinelearningbasedonsimulatedpopulationgeneticdata
AT siepeladam domainadaptiveneuralnetworksimprovesupervisedmachinelearningbasedonsimulatedpopulationgeneticdata