Cargando…

Applications of Natural Language Processing in Biodiversity Science

Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer...

Descripción completa

Detalles Bibliográficos
Autores principales: Thessen, Anne E., Cui, Hong, Mozzherin, Dmitry
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3364545/
https://www.ncbi.nlm.nih.gov/pubmed/22685456
http://dx.doi.org/10.1155/2012/391574
_version_ 1782234551715627008
author Thessen, Anne E.
Cui, Hong
Mozzherin, Dmitry
author_facet Thessen, Anne E.
Cui, Hong
Mozzherin, Dmitry
author_sort Thessen, Anne E.
collection PubMed
description Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.
format Online
Article
Text
id pubmed-3364545
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-33645452012-06-08 Applications of Natural Language Processing in Biodiversity Science Thessen, Anne E. Cui, Hong Mozzherin, Dmitry Adv Bioinformatics Review Article Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science. Hindawi Publishing Corporation 2012 2012-05-22 /pmc/articles/PMC3364545/ /pubmed/22685456 http://dx.doi.org/10.1155/2012/391574 Text en Copyright © 2012 Anne E. Thessen et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review Article
Thessen, Anne E.
Cui, Hong
Mozzherin, Dmitry
Applications of Natural Language Processing in Biodiversity Science
title Applications of Natural Language Processing in Biodiversity Science
title_full Applications of Natural Language Processing in Biodiversity Science
title_fullStr Applications of Natural Language Processing in Biodiversity Science
title_full_unstemmed Applications of Natural Language Processing in Biodiversity Science
title_short Applications of Natural Language Processing in Biodiversity Science
title_sort applications of natural language processing in biodiversity science
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3364545/
https://www.ncbi.nlm.nih.gov/pubmed/22685456
http://dx.doi.org/10.1155/2012/391574
work_keys_str_mv AT thessenannee applicationsofnaturallanguageprocessinginbiodiversityscience
AT cuihong applicationsofnaturallanguageprocessinginbiodiversityscience
AT mozzherindmitry applicationsofnaturallanguageprocessinginbiodiversityscience