Cargando…
Recognition of Latin scientific names using artificial neural networks
PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus o...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394707/ https://www.ncbi.nlm.nih.gov/pubmed/32765977 http://dx.doi.org/10.1002/aps3.11378 |
Sumario: | PREMISE: The automated recognition of Latin scientific names within vernacular text has many applications, including text mining, search indexing, and automated specimen‐label processing. Most published solutions are computationally inefficient, incapable of running within a web browser, and focus on texts in English, thus omitting a substantial portion of biodiversity literature. METHODS AND RESULTS: An open‐source browser‐executable solution, Quaesitor, is presented here. It uses pattern matching (regular expressions) in combination with an ensembled classifier composed of an inclusion dictionary search (Bloom filter), a trio of complementary neural networks that differ in their approach to encoding text, and word length to automatically identify Latin scientific names in the 16 most common languages for biodiversity articles. CONCLUSIONS: In combination, the classifiers can recognize Latin scientific names in isolation or embedded within the languages used for >96% of biodiversity literature titles. For three different data sets, they resulted in a 0.80–0.97 recall and a 0.69–0.84 precision at a rate of 8.6 ms/word. |
---|