Cargando…

Automated annotation of chemical names in the literature with tunable accuracy

BACKGROUND: A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jun D, Geer, Lewis Y, Bolton, Evan E, Bryant, Stephen H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3281788/
https://www.ncbi.nlm.nih.gov/pubmed/22107874
http://dx.doi.org/10.1186/1758-2946-3-52
_version_ 1782223988811890688
author Zhang, Jun D
Geer, Lewis Y
Bolton, Evan E
Bryant, Stephen H
author_facet Zhang, Jun D
Geer, Lewis Y
Bolton, Evan E
Bryant, Stephen H
author_sort Zhang, Jun D
collection PubMed
description BACKGROUND: A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. RESULTS: An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. CONCLUSIONS: Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.
format Online
Article
Text
id pubmed-3281788
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32817882012-02-23 Automated annotation of chemical names in the literature with tunable accuracy Zhang, Jun D Geer, Lewis Y Bolton, Evan E Bryant, Stephen H J Cheminform Research Article BACKGROUND: A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. RESULTS: An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. CONCLUSIONS: Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data. BioMed Central 2011-11-22 /pmc/articles/PMC3281788/ /pubmed/22107874 http://dx.doi.org/10.1186/1758-2946-3-52 Text en Copyright ©2011 Zhang et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Jun D
Geer, Lewis Y
Bolton, Evan E
Bryant, Stephen H
Automated annotation of chemical names in the literature with tunable accuracy
title Automated annotation of chemical names in the literature with tunable accuracy
title_full Automated annotation of chemical names in the literature with tunable accuracy
title_fullStr Automated annotation of chemical names in the literature with tunable accuracy
title_full_unstemmed Automated annotation of chemical names in the literature with tunable accuracy
title_short Automated annotation of chemical names in the literature with tunable accuracy
title_sort automated annotation of chemical names in the literature with tunable accuracy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3281788/
https://www.ncbi.nlm.nih.gov/pubmed/22107874
http://dx.doi.org/10.1186/1758-2946-3-52
work_keys_str_mv AT zhangjund automatedannotationofchemicalnamesintheliteraturewithtunableaccuracy
AT geerlewisy automatedannotationofchemicalnamesintheliteraturewithtunableaccuracy
AT boltonevane automatedannotationofchemicalnamesintheliteraturewithtunableaccuracy
AT bryantstephenh automatedannotationofchemicalnamesintheliteraturewithtunableaccuracy