Cargando…

Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data

Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world....

Descripción completa

Detalles Bibliográficos
Autores principales:	Mo, Ziyi, Siepel, Adam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10655966/ https://www.ncbi.nlm.nih.gov/pubmed/37934781 http://dx.doi.org/10.1371/journal.pgen.1011032

_version_	1785136913814913024
author	Mo, Ziyi Siepel, Adam
author_facet	Mo, Ziyi Siepel, Adam
author_sort	Mo, Ziyi
collection	PubMed
description	Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world. Here, we show that this “simulation mis-specification” problem can be framed as a “domain adaptation” problem, where a model learned from one data distribution is applied to a dataset drawn from a different distribution. By applying an established domain-adaptation technique based on a gradient reversal layer (GRL), originally introduced for image classification, we show that the effects of simulation mis-specification can be substantially mitigated. We focus our analysis on two state-of-the-art deep-learning population genetic methods—SIA, which infers positive selection from features of the ancestral recombination graph (ARG), and ReLERNN, which infers recombination rates from genotype matrices. In the case of SIA, the domain adaptive framework also compensates for ARG inference error. Using the domain-adaptive SIA (dadaSIA) model, we estimate improved selection coefficients at selected loci in the 1000 Genomes CEU population. We anticipate that domain adaptation will prove to be widely applicable in the growing use of supervised machine learning in population genetics.
format	Online Article Text
id	pubmed-10655966
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-106559662023-11-07 Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data Mo, Ziyi Siepel, Adam PLoS Genet Research Article Investigators have recently introduced powerful methods for population genetic inference that rely on supervised machine learning from simulated data. Despite their performance advantages, these methods can fail when the simulated training data does not adequately resemble data from the real world. Here, we show that this “simulation mis-specification” problem can be framed as a “domain adaptation” problem, where a model learned from one data distribution is applied to a dataset drawn from a different distribution. By applying an established domain-adaptation technique based on a gradient reversal layer (GRL), originally introduced for image classification, we show that the effects of simulation mis-specification can be substantially mitigated. We focus our analysis on two state-of-the-art deep-learning population genetic methods—SIA, which infers positive selection from features of the ancestral recombination graph (ARG), and ReLERNN, which infers recombination rates from genotype matrices. In the case of SIA, the domain adaptive framework also compensates for ARG inference error. Using the domain-adaptive SIA (dadaSIA) model, we estimate improved selection coefficients at selected loci in the 1000 Genomes CEU population. We anticipate that domain adaptation will prove to be widely applicable in the growing use of supervised machine learning in population genetics. Public Library of Science 2023-11-07 /pmc/articles/PMC10655966/ /pubmed/37934781 http://dx.doi.org/10.1371/journal.pgen.1011032 Text en © 2023 Mo, Siepel https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Mo, Ziyi Siepel, Adam Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title	Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_full	Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_fullStr	Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_full_unstemmed	Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_short	Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
title_sort	domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10655966/ https://www.ncbi.nlm.nih.gov/pubmed/37934781 http://dx.doi.org/10.1371/journal.pgen.1011032
work_keys_str_mv	AT moziyi domainadaptiveneuralnetworksimprovesupervisedmachinelearningbasedonsimulatedpopulationgeneticdata AT siepeladam domainadaptiveneuralnetworksimprovesupervisedmachinelearningbasedonsimulatedpopulationgeneticdata

Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data

Ejemplares similares