Cargando…

Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases

PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees f...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Brianna K., Garg, Manish, Kubat, Miroslav, Afkhami, Michelle E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394705/
https://www.ncbi.nlm.nih.gov/pubmed/32765978
http://dx.doi.org/10.1002/aps3.11379
_version_ 1783565274616692736
author Almeida, Brianna K.
Garg, Manish
Kubat, Miroslav
Afkhami, Michelle E.
author_facet Almeida, Brianna K.
Garg, Manish
Kubat, Miroslav
Afkhami, Michelle E.
author_sort Almeida, Brianna K.
collection PubMed
description PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. METHODS: Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. RESULTS: The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. DISCUSSION: Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification.
format Online
Article
Text
id pubmed-7394705
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73947052020-08-05 Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases Almeida, Brianna K. Garg, Manish Kubat, Miroslav Afkhami, Michelle E. Appl Plant Sci Application Articles PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. METHODS: Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. RESULTS: The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. DISCUSSION: Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification. John Wiley and Sons Inc. 2020-07-31 /pmc/articles/PMC7394705/ /pubmed/32765978 http://dx.doi.org/10.1002/aps3.11379 Text en © 2020 Almeida et al. Applications in Plant Sciences is published by Wiley Periodicals LLC on behalf of the Botanical Society of America. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Application Articles
Almeida, Brianna K.
Garg, Manish
Kubat, Miroslav
Afkhami, Michelle E.
Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_full Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_fullStr Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_full_unstemmed Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_short Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
title_sort not that kind of tree: assessing the potential for decision tree–based plant identification using trait databases
topic Application Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394705/
https://www.ncbi.nlm.nih.gov/pubmed/32765978
http://dx.doi.org/10.1002/aps3.11379
work_keys_str_mv AT almeidabriannak notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT gargmanish notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT kubatmiroslav notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases
AT afkhamimichellee notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases