Cargando…

IntroUNET: identifying introgressed alleles via semantic segmentation

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ray, Dylan D., Flagel, Lex, Schrider, Daniel R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979274/ https://www.ncbi.nlm.nih.gov/pubmed/36865105 http://dx.doi.org/10.1101/2023.02.07.527435

_version_	1784899692446875648
author	Ray, Dylan D. Flagel, Lex Schrider, Daniel R.
author_facet	Ray, Dylan D. Flagel, Lex Schrider, Daniel R.
author_sort	Ray, Dylan D.
collection	PubMed
description	A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient—ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual’s alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled “ghost” population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method’s success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
format	Online Article Text
id	pubmed-9979274
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-99792742023-03-03 IntroUNET: identifying introgressed alleles via semantic segmentation Ray, Dylan D. Flagel, Lex Schrider, Daniel R. bioRxiv Article A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient—ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual’s alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled “ghost” population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method’s success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data. Cold Spring Harbor Laboratory 2023-10-01 /pmc/articles/PMC9979274/ /pubmed/36865105 http://dx.doi.org/10.1101/2023.02.07.527435 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Ray, Dylan D. Flagel, Lex Schrider, Daniel R. IntroUNET: identifying introgressed alleles via semantic segmentation
title	IntroUNET: identifying introgressed alleles via semantic segmentation
title_full	IntroUNET: identifying introgressed alleles via semantic segmentation
title_fullStr	IntroUNET: identifying introgressed alleles via semantic segmentation
title_full_unstemmed	IntroUNET: identifying introgressed alleles via semantic segmentation
title_short	IntroUNET: identifying introgressed alleles via semantic segmentation
title_sort	introunet: identifying introgressed alleles via semantic segmentation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979274/ https://www.ncbi.nlm.nih.gov/pubmed/36865105 http://dx.doi.org/10.1101/2023.02.07.527435
work_keys_str_mv	AT raydyland introunetidentifyingintrogressedallelesviasemanticsegmentation AT flagellex introunetidentifyingintrogressedallelesviasemanticsegmentation AT schriderdanielr introunetidentifyingintrogressedallelesviasemanticsegmentation

IntroUNET: identifying introgressed alleles via semantic segmentation

Ejemplares similares