Cargando…
Recognition of Latin scientific names using artificial neural networks
PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus o...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394707/ https://www.ncbi.nlm.nih.gov/pubmed/32765977 http://dx.doi.org/10.1002/aps3.11378 |
_version_ | 1783565275086454784 |
---|---|
author | Little, Damon P. |
author_facet | Little, Damon P. |
author_sort | Little, Damon P. |
collection | PubMed |
description | PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. METHODS AND RESULTS: An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. CONCLUSIONS: In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word. |
format | Online Article Text |
id | pubmed-7394707 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73947072020-08-05 Recognition of Latin scientific names using artificial neural networks Little, Damon P. Appl Plant Sci Software Notes PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. METHODS AND RESULTS: An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. CONCLUSIONS: In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word. John Wiley and Sons Inc. 2020-07-31 /pmc/articles/PMC7394707/ /pubmed/32765977 http://dx.doi.org/10.1002/aps3.11378 Text en © 2020 Little. Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
spellingShingle | Software Notes Little, Damon P. Recognition of Latin scientific names using artificial neural networks |
title | Recognition of Latin scientific names using artificial neural networks |
title_full | Recognition of Latin scientific names using artificial neural networks |
title_fullStr | Recognition of Latin scientific names using artificial neural networks |
title_full_unstemmed | Recognition of Latin scientific names using artificial neural networks |
title_short | Recognition of Latin scientific names using artificial neural networks |
title_sort | recognition of latin scientific names using artificial neural networks |
topic | Software Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394707/ https://www.ncbi.nlm.nih.gov/pubmed/32765977 http://dx.doi.org/10.1002/aps3.11378 |
work_keys_str_mv | AT littledamonp recognitionoflatinscientificnamesusingartificialneuralnetworks |