Cargando…

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature

BACKGROUND: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and...

Descripción completa

Detalles Bibliográficos
Autores principales: Usié, Anabel, Cruz, Joaquim, Comas, Jorge, Solsona, Francesc, Alves, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331691/
https://www.ncbi.nlm.nih.gov/pubmed/25810772
http://dx.doi.org/10.1186/1758-2946-7-S1-S15
_version_ 1782357759718588416
author Usié, Anabel
Cruz, Joaquim
Comas, Jorge
Solsona, Francesc
Alves, Rui
author_facet Usié, Anabel
Cruz, Joaquim
Comas, Jorge
Solsona, Francesc
Alves, Rui
author_sort Usié, Anabel
collection PubMed
description BACKGROUND: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. METHODS: To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. RESULTS: We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. CONCLUSIONS: CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.
format Online
Article
Text
id pubmed-4331691
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43316912015-03-25 CheNER: a tool for the identification of chemical entities and their classes in biomedical literature Usié, Anabel Cruz, Joaquim Comas, Jorge Solsona, Francesc Alves, Rui J Cheminform Research BACKGROUND: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. METHODS: To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. RESULTS: We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. CONCLUSIONS: CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines. BioMed Central 2015-01-19 /pmc/articles/PMC4331691/ /pubmed/25810772 http://dx.doi.org/10.1186/1758-2946-7-S1-S15 Text en Copyright © 2015 Usié et al.; licensee Springer. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Usié, Anabel
Cruz, Joaquim
Comas, Jorge
Solsona, Francesc
Alves, Rui
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title_full CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title_fullStr CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title_full_unstemmed CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title_short CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
title_sort chener: a tool for the identification of chemical entities and their classes in biomedical literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331691/
https://www.ncbi.nlm.nih.gov/pubmed/25810772
http://dx.doi.org/10.1186/1758-2946-7-S1-S15
work_keys_str_mv AT usieanabel cheneratoolfortheidentificationofchemicalentitiesandtheirclassesinbiomedicalliterature
AT cruzjoaquim cheneratoolfortheidentificationofchemicalentitiesandtheirclassesinbiomedicalliterature
AT comasjorge cheneratoolfortheidentificationofchemicalentitiesandtheirclassesinbiomedicalliterature
AT solsonafrancesc cheneratoolfortheidentificationofchemicalentitiesandtheirclassesinbiomedicalliterature
AT alvesrui cheneratoolfortheidentificationofchemicalentitiesandtheirclassesinbiomedicalliterature