Cargando…
Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases
PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees f...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394705/ https://www.ncbi.nlm.nih.gov/pubmed/32765978 http://dx.doi.org/10.1002/aps3.11379 |
_version_ | 1783565274616692736 |
---|---|
author | Almeida, Brianna K. Garg, Manish Kubat, Miroslav Afkhami, Michelle E. |
author_facet | Almeida, Brianna K. Garg, Manish Kubat, Miroslav Afkhami, Michelle E. |
author_sort | Almeida, Brianna K. |
collection | PubMed |
description | PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. METHODS: Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. RESULTS: The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. DISCUSSION: Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification. |
format | Online Article Text |
id | pubmed-7394705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73947052020-08-05 Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases Almeida, Brianna K. Garg, Manish Kubat, Miroslav Afkhami, Michelle E. Appl Plant Sci Application Articles PREMISE: Advancements in machine learning and the rise of accessible “big data” provide an important opportunity to improve trait‐based plant identification. Here, we applied decision‐tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa. METHODS: Decision trees were induced using 16 vegetative and floral traits (689 species, 20 genera). We assessed how well the algorithm classified species from test data and pinpointed those traits that were important for identification across diverse taxa. RESULTS: The unpruned tree correctly placed 98% of the species in our data set into genera, indicating its promise for distinguishing among the species used to construct them. Furthermore, in the pruned tree, an average of 89% of the species from the test data sets were properly classified into their genera, demonstrating the flexibility of decision trees to also classify new species into genera within the tree. Closer inspection revealed that seven of the 16 traits were sufficient for the classification, and these traits yielded approximately two times more initial information gain than those not included. DISCUSSION: Our findings demonstrate the potential for tree‐based machine learning and big data in distinguishing among taxa and determining which traits are important for plant identification. John Wiley and Sons Inc. 2020-07-31 /pmc/articles/PMC7394705/ /pubmed/32765978 http://dx.doi.org/10.1002/aps3.11379 Text en © 2020 Almeida et al. Applications in Plant Sciences is published by Wiley Periodicals LLC on behalf of the Botanical Society of America. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
spellingShingle | Application Articles Almeida, Brianna K. Garg, Manish Kubat, Miroslav Afkhami, Michelle E. Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title | Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title_full | Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title_fullStr | Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title_full_unstemmed | Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title_short | Not that kind of tree: Assessing the potential for decision tree–based plant identification using trait databases |
title_sort | not that kind of tree: assessing the potential for decision tree–based plant identification using trait databases |
topic | Application Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394705/ https://www.ncbi.nlm.nih.gov/pubmed/32765978 http://dx.doi.org/10.1002/aps3.11379 |
work_keys_str_mv | AT almeidabriannak notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases AT gargmanish notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases AT kubatmiroslav notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases AT afkhamimichellee notthatkindoftreeassessingthepotentialfordecisiontreebasedplantidentificationusingtraitdatabases |