Cargando…
NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/ https://www.ncbi.nlm.nih.gov/pubmed/31414701 http://dx.doi.org/10.1093/database/baw135 |
Sumario: | Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem. |
---|