Cargando…

Applications of random forest feature selection for fine‐scale genetic population assignment

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized rand...

Descripción completa

Detalles Bibliográficos
Autores principales: Sylvester, Emma V. A., Bentzen, Paul, Bradbury, Ian R., Clément, Marie, Pearce, Jon, Horne, John, Beiko, Robert G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5775496/
https://www.ncbi.nlm.nih.gov/pubmed/29387152
http://dx.doi.org/10.1111/eva.12524
_version_ 1783293920293158912
author Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
author_facet Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
author_sort Sylvester, Emma V. A.
collection PubMed
description Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F (ST) ranking for selection of single nucleotide polymorphisms (SNP) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNPs identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F (ST)‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F (ST)‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
format Online
Article
Text
id pubmed-5775496
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-57754962018-01-31 Applications of random forest feature selection for fine‐scale genetic population assignment Sylvester, Emma V. A. Bentzen, Paul Bradbury, Ian R. Clément, Marie Pearce, Jon Horne, John Beiko, Robert G. Evol Appl Original Articles Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F (ST) ranking for selection of single nucleotide polymorphisms (SNP) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNPs identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than F (ST)‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F (ST)‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations. John Wiley and Sons Inc. 2017-09-14 /pmc/articles/PMC5775496/ /pubmed/29387152 http://dx.doi.org/10.1111/eva.12524 Text en © 2017 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Sylvester, Emma V. A.
Bentzen, Paul
Bradbury, Ian R.
Clément, Marie
Pearce, Jon
Horne, John
Beiko, Robert G.
Applications of random forest feature selection for fine‐scale genetic population assignment
title Applications of random forest feature selection for fine‐scale genetic population assignment
title_full Applications of random forest feature selection for fine‐scale genetic population assignment
title_fullStr Applications of random forest feature selection for fine‐scale genetic population assignment
title_full_unstemmed Applications of random forest feature selection for fine‐scale genetic population assignment
title_short Applications of random forest feature selection for fine‐scale genetic population assignment
title_sort applications of random forest feature selection for fine‐scale genetic population assignment
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5775496/
https://www.ncbi.nlm.nih.gov/pubmed/29387152
http://dx.doi.org/10.1111/eva.12524
work_keys_str_mv AT sylvesteremmava applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT bentzenpaul applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT bradburyianr applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT clementmarie applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT pearcejon applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT hornejohn applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment
AT beikorobertg applicationsofrandomforestfeatureselectionforfinescalegeneticpopulationassignment