Cargando…

Learning the properties of adaptive regions with functional data analysis

Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in g...

Descripción completa

Detalles Bibliográficos
Autores principales: Mughal, Mehreen R., Koch, Hillary, Huang, Jinguo, Chiaromonte, Francesca, DeGiorgio, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480868/
https://www.ncbi.nlm.nih.gov/pubmed/32853200
http://dx.doi.org/10.1371/journal.pgen.1008896
_version_ 1783580489727082496
author Mughal, Mehreen R.
Koch, Hillary
Huang, Jinguo
Chiaromonte, Francesca
DeGiorgio, Michael
author_facet Mughal, Mehreen R.
Koch, Hillary
Huang, Jinguo
Chiaromonte, Francesca
DeGiorgio, Michael
author_sort Mughal, Mehreen R.
collection PubMed
description Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.
format Online
Article
Text
id pubmed-7480868
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74808682020-09-18 Learning the properties of adaptive regions with functional data analysis Mughal, Mehreen R. Koch, Hillary Huang, Jinguo Chiaromonte, Francesca DeGiorgio, Michael PLoS Genet Research Article Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range. Public Library of Science 2020-08-27 /pmc/articles/PMC7480868/ /pubmed/32853200 http://dx.doi.org/10.1371/journal.pgen.1008896 Text en © 2020 Mughal et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mughal, Mehreen R.
Koch, Hillary
Huang, Jinguo
Chiaromonte, Francesca
DeGiorgio, Michael
Learning the properties of adaptive regions with functional data analysis
title Learning the properties of adaptive regions with functional data analysis
title_full Learning the properties of adaptive regions with functional data analysis
title_fullStr Learning the properties of adaptive regions with functional data analysis
title_full_unstemmed Learning the properties of adaptive regions with functional data analysis
title_short Learning the properties of adaptive regions with functional data analysis
title_sort learning the properties of adaptive regions with functional data analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480868/
https://www.ncbi.nlm.nih.gov/pubmed/32853200
http://dx.doi.org/10.1371/journal.pgen.1008896
work_keys_str_mv AT mughalmehreenr learningthepropertiesofadaptiveregionswithfunctionaldataanalysis
AT kochhillary learningthepropertiesofadaptiveregionswithfunctionaldataanalysis
AT huangjinguo learningthepropertiesofadaptiveregionswithfunctionaldataanalysis
AT chiaromontefrancesca learningthepropertiesofadaptiveregionswithfunctionaldataanalysis
AT degiorgiomichael learningthepropertiesofadaptiveregionswithfunctionaldataanalysis