Cargando…

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities wer...

Descripción completa

Detalles Bibliográficos
Autores principales: Islamaj, Rezarta, Leaman, Robert, Kim, Sun, Kwon, Dongseop, Wei, Chih-Hsuan, Comeau, Donald C., Peng, Yifan, Cissel, David, Coss, Cathleen, Fisher, Carol, Guzman, Rob, Kochar, Preeti Gokal, Koppel, Stella, Trinh, Dorothy, Sekiya, Keiko, Ward, Janice, Whitman, Deborah, Schmidt, Susan, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7994842/
https://www.ncbi.nlm.nih.gov/pubmed/33767203
http://dx.doi.org/10.1038/s41597-021-00875-1
_version_ 1783669840740876288
author Islamaj, Rezarta
Leaman, Robert
Kim, Sun
Kwon, Dongseop
Wei, Chih-Hsuan
Comeau, Donald C.
Peng, Yifan
Cissel, David
Coss, Cathleen
Fisher, Carol
Guzman, Rob
Kochar, Preeti Gokal
Koppel, Stella
Trinh, Dorothy
Sekiya, Keiko
Ward, Janice
Whitman, Deborah
Schmidt, Susan
Lu, Zhiyong
author_facet Islamaj, Rezarta
Leaman, Robert
Kim, Sun
Kwon, Dongseop
Wei, Chih-Hsuan
Comeau, Donald C.
Peng, Yifan
Cissel, David
Coss, Cathleen
Fisher, Carol
Guzman, Rob
Kochar, Preeti Gokal
Koppel, Stella
Trinh, Dorothy
Sekiya, Keiko
Ward, Janice
Whitman, Deborah
Schmidt, Susan
Lu, Zhiyong
author_sort Islamaj, Rezarta
collection PubMed
description Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.
format Online
Article
Text
id pubmed-7994842
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-79948422021-04-16 NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature Islamaj, Rezarta Leaman, Robert Kim, Sun Kwon, Dongseop Wei, Chih-Hsuan Comeau, Donald C. Peng, Yifan Cissel, David Coss, Cathleen Fisher, Carol Guzman, Rob Kochar, Preeti Gokal Koppel, Stella Trinh, Dorothy Sekiya, Keiko Ward, Janice Whitman, Deborah Schmidt, Susan Lu, Zhiyong Sci Data Data Descriptor Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available. Nature Publishing Group UK 2021-03-25 /pmc/articles/PMC7994842/ /pubmed/33767203 http://dx.doi.org/10.1038/s41597-021-00875-1 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Islamaj, Rezarta
Leaman, Robert
Kim, Sun
Kwon, Dongseop
Wei, Chih-Hsuan
Comeau, Donald C.
Peng, Yifan
Cissel, David
Coss, Cathleen
Fisher, Carol
Guzman, Rob
Kochar, Preeti Gokal
Koppel, Stella
Trinh, Dorothy
Sekiya, Keiko
Ward, Janice
Whitman, Deborah
Schmidt, Susan
Lu, Zhiyong
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title_full NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title_fullStr NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title_full_unstemmed NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title_short NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
title_sort nlm-chem, a new resource for chemical entity recognition in pubmed full text literature
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7994842/
https://www.ncbi.nlm.nih.gov/pubmed/33767203
http://dx.doi.org/10.1038/s41597-021-00875-1
work_keys_str_mv AT islamajrezarta nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT leamanrobert nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT kimsun nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT kwondongseop nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT weichihhsuan nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT comeaudonaldc nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT pengyifan nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT cisseldavid nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT cosscathleen nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT fishercarol nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT guzmanrob nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT kocharpreetigokal nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT koppelstella nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT trinhdorothy nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT sekiyakeiko nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT wardjanice nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT whitmandeborah nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT schmidtsusan nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature
AT luzhiyong nlmchemanewresourceforchemicalentityrecognitioninpubmedfulltextliterature