Cargando…

Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery

PREMISE: Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by tax...

Descripción completa

Detalles Bibliográficos
Autores principales: Saryan, Preeti, Gupta, Shubham, Gowda, Vinita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394710/
https://www.ncbi.nlm.nih.gov/pubmed/32765976
http://dx.doi.org/10.1002/aps3.11377
_version_ 1783565275759640576
author Saryan, Preeti
Gupta, Shubham
Gowda, Vinita
author_facet Saryan, Preeti
Gupta, Shubham
Gowda, Vinita
author_sort Saryan, Preeti
collection PubMed
description PREMISE: Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. METHODS: We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. RESULTS: Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. DISCUSSION: Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.
format Online
Article
Text
id pubmed-7394710
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73947102020-08-05 Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery Saryan, Preeti Gupta, Shubham Gowda, Vinita Appl Plant Sci Application Articles PREMISE: Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non‐metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster‐defining characters. METHODS: We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user‐specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information–based character selection and a t‐test were used to identify cluster‐defining characters. RESULTS: Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. DISCUSSION: Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation. John Wiley and Sons Inc. 2020-07-31 /pmc/articles/PMC7394710/ /pubmed/32765976 http://dx.doi.org/10.1002/aps3.11377 Text en © 2020 Saryan et al. Applications in Plant Sciences is published by Wiley Periodicals LLC on behalf of the Botanical Society of America This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Articles
Saryan, Preeti
Gupta, Shubham
Gowda, Vinita
Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_full Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_fullStr Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_full_unstemmed Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_short Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery
title_sort species complex delimitations in the genus hedychium: a machine learning approach for cluster discovery
topic Application Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394710/
https://www.ncbi.nlm.nih.gov/pubmed/32765976
http://dx.doi.org/10.1002/aps3.11377
work_keys_str_mv AT saryanpreeti speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery
AT guptashubham speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery
AT gowdavinita speciescomplexdelimitationsinthegenushedychiumamachinelearningapproachforclusterdiscovery