Cargando…
NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/ https://www.ncbi.nlm.nih.gov/pubmed/31414701 http://dx.doi.org/10.1093/database/baw135 |
_version_ | 1782464556986007552 |
---|---|
author | Tsai, Richard Tzong-Han Hsiao, Yu-Cheng Lai, Po-Ting |
author_facet | Tsai, Richard Tzong-Han Hsiao, Yu-Cheng Lai, Po-Ting |
author_sort | Tsai, Richard Tzong-Han |
collection | PubMed |
description | Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem. |
format | Online Article Text |
id | pubmed-5091336 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-50913362016-11-03 NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition Tsai, Richard Tzong-Han Hsiao, Yu-Cheng Lai, Po-Ting Database (Oxford) Original Article Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem. Oxford University Press 2016-10-25 /pmc/articles/PMC5091336/ /pubmed/31414701 http://dx.doi.org/10.1093/database/baw135 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Tsai, Richard Tzong-Han Hsiao, Yu-Cheng Lai, Po-Ting NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title | NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title_full | NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title_fullStr | NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title_full_unstemmed | NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title_short | NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
title_sort | nerchem: adapting nerbio to chemical patents via full-token features and named entity feature with chemical sub-class composition |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/ https://www.ncbi.nlm.nih.gov/pubmed/31414701 http://dx.doi.org/10.1093/database/baw135 |
work_keys_str_mv | AT tsairichardtzonghan nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition AT hsiaoyucheng nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition AT laipoting nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition |