Cargando…

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data

The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coale...

Descripción completa

Detalles Bibliográficos
Autores principales: Derkarabetian, Shahan, Starrett, James, Hedin, Marshal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8862334/
https://www.ncbi.nlm.nih.gov/pubmed/35193622
http://dx.doi.org/10.1186/s12983-022-00453-0
_version_ 1784655035610693632
author Derkarabetian, Shahan
Starrett, James
Hedin, Marshal
author_facet Derkarabetian, Shahan
Starrett, James
Hedin, Marshal
author_sort Derkarabetian, Shahan
collection PubMed
description The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a “custom” training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12983-022-00453-0.
format Online
Article
Text
id pubmed-8862334
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88623342022-02-23 Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data Derkarabetian, Shahan Starrett, James Hedin, Marshal Front Zool Research The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a “custom” training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12983-022-00453-0. BioMed Central 2022-02-22 /pmc/articles/PMC8862334/ /pubmed/35193622 http://dx.doi.org/10.1186/s12983-022-00453-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Derkarabetian, Shahan
Starrett, James
Hedin, Marshal
Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title_full Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title_fullStr Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title_full_unstemmed Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title_short Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
title_sort using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8862334/
https://www.ncbi.nlm.nih.gov/pubmed/35193622
http://dx.doi.org/10.1186/s12983-022-00453-0
work_keys_str_mv AT derkarabetianshahan usingnaturalhistorytoguidesupervisedmachinelearningforcrypticspeciesdelimitationwithgeneticdata
AT starrettjames usingnaturalhistorytoguidesupervisedmachinelearningforcrypticspeciesdelimitationwithgeneticdata
AT hedinmarshal usingnaturalhistorytoguidesupervisedmachinelearningforcrypticspeciesdelimitationwithgeneticdata