Cargando…

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required...

Descripción completa

Detalles Bibliográficos
Autores principales: Schrider, Daniel R., Ayroles, Julien, Matute, Daniel R., Kern, Andrew D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5933812/
https://www.ncbi.nlm.nih.gov/pubmed/29684059
http://dx.doi.org/10.1371/journal.pgen.1007341
_version_ 1783320016759816192
author Schrider, Daniel R.
Ayroles, Julien
Matute, Daniel R.
Kern, Andrew D.
author_facet Schrider, Daniel R.
Ayroles, Julien
Matute, Daniel R.
Kern, Andrew D.
author_sort Schrider, Daniel R.
collection PubMed
description Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.
format Online
Article
Text
id pubmed-5933812
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59338122018-05-18 Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia Schrider, Daniel R. Ayroles, Julien Matute, Daniel R. Kern, Andrew D. PLoS Genet Research Article Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia. Public Library of Science 2018-04-23 /pmc/articles/PMC5933812/ /pubmed/29684059 http://dx.doi.org/10.1371/journal.pgen.1007341 Text en © 2018 Schrider et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Schrider, Daniel R.
Ayroles, Julien
Matute, Daniel R.
Kern, Andrew D.
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title_full Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title_fullStr Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title_full_unstemmed Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title_short Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia
title_sort supervised machine learning reveals introgressed loci in the genomes of drosophila simulans and d. sechellia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5933812/
https://www.ncbi.nlm.nih.gov/pubmed/29684059
http://dx.doi.org/10.1371/journal.pgen.1007341
work_keys_str_mv AT schriderdanielr supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT ayrolesjulien supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT matutedanielr supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT kernandrewd supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia