Cargando…

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Richard Tzong-Han, Hsiao, Yu-Cheng, Lai, Po-Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/
https://www.ncbi.nlm.nih.gov/pubmed/31414701
http://dx.doi.org/10.1093/database/baw135
Descripción
Sumario:Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem.