Cargando…

Recognition of Latin scientific names using artificial neural networks

PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus o...

Descripción completa

Detalles Bibliográficos
Autor principal: Little, Damon P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394707/
https://www.ncbi.nlm.nih.gov/pubmed/32765977
http://dx.doi.org/10.1002/aps3.11378
_version_ 1783565275086454784
author Little, Damon P.
author_facet Little, Damon P.
author_sort Little, Damon P.
collection PubMed
description PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. METHODS AND RESULTS: An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. CONCLUSIONS: In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word.
format Online
Article
Text
id pubmed-7394707
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73947072020-08-05 Recognition of Latin scientific names using artificial neural networks Little, Damon P. Appl Plant Sci Software Notes PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. METHODS AND RESULTS: An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. CONCLUSIONS: In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word. John Wiley and Sons Inc. 2020-07-31 /pmc/articles/PMC7394707/ /pubmed/32765977 http://dx.doi.org/10.1002/aps3.11378 Text en © 2020 Little. Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Software Notes
Little, Damon P.
Recognition of Latin scientific names using artificial neural networks
title Recognition of Latin scientific names using artificial neural networks
title_full Recognition of Latin scientific names using artificial neural networks
title_fullStr Recognition of Latin scientific names using artificial neural networks
title_full_unstemmed Recognition of Latin scientific names using artificial neural networks
title_short Recognition of Latin scientific names using artificial neural networks
title_sort recognition of latin scientific names using artificial neural networks
topic Software Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394707/
https://www.ncbi.nlm.nih.gov/pubmed/32765977
http://dx.doi.org/10.1002/aps3.11378
work_keys_str_mv AT littledamonp recognitionoflatinscientificnamesusingartificialneuralnetworks