Cargando…

S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning

Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest...

Descripción completa

Detalles Bibliográficos
Autores principales: Schrider, Daniel R., Kern, Andrew D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792382/
https://www.ncbi.nlm.nih.gov/pubmed/26977894
http://dx.doi.org/10.1371/journal.pgen.1005928
_version_ 1782421232886480896
author Schrider, Daniel R.
Kern, Andrew D.
author_facet Schrider, Daniel R.
Kern, Andrew D.
author_sort Schrider, Daniel R.
collection PubMed
description Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
format Online
Article
Text
id pubmed-4792382
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47923822016-03-23 S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning Schrider, Daniel R. Kern, Andrew D. PLoS Genet Research Article Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods. Public Library of Science 2016-03-15 /pmc/articles/PMC4792382/ /pubmed/26977894 http://dx.doi.org/10.1371/journal.pgen.1005928 Text en © 2016 Schrider, Kern http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Schrider, Daniel R.
Kern, Andrew D.
S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title_full S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title_fullStr S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title_full_unstemmed S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title_short S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning
title_sort s/hic: robust identification of soft and hard sweeps using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792382/
https://www.ncbi.nlm.nih.gov/pubmed/26977894
http://dx.doi.org/10.1371/journal.pgen.1005928
work_keys_str_mv AT schriderdanielr shicrobustidentificationofsoftandhardsweepsusingmachinelearning
AT kernandrewd shicrobustidentificationofsoftandhardsweepsusingmachinelearning