Cargando…

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Richard Tzong-Han, Hsiao, Yu-Cheng, Lai, Po-Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/
https://www.ncbi.nlm.nih.gov/pubmed/31414701
http://dx.doi.org/10.1093/database/baw135
_version_ 1782464556986007552
author Tsai, Richard Tzong-Han
Hsiao, Yu-Cheng
Lai, Po-Ting
author_facet Tsai, Richard Tzong-Han
Hsiao, Yu-Cheng
Lai, Po-Ting
author_sort Tsai, Richard Tzong-Han
collection PubMed
description Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem.
format Online
Article
Text
id pubmed-5091336
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50913362016-11-03 NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition Tsai, Richard Tzong-Han Hsiao, Yu-Cheng Lai, Po-Ting Database (Oxford) Original Article Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem. Oxford University Press 2016-10-25 /pmc/articles/PMC5091336/ /pubmed/31414701 http://dx.doi.org/10.1093/database/baw135 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Tsai, Richard Tzong-Han
Hsiao, Yu-Cheng
Lai, Po-Ting
NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title_full NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title_fullStr NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title_full_unstemmed NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title_short NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition
title_sort nerchem: adapting nerbio to chemical patents via full-token features and named entity feature with chemical sub-class composition
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091336/
https://www.ncbi.nlm.nih.gov/pubmed/31414701
http://dx.doi.org/10.1093/database/baw135
work_keys_str_mv AT tsairichardtzonghan nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition
AT hsiaoyucheng nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition
AT laipoting nerchemadaptingnerbiotochemicalpatentsviafulltokenfeaturesandnamedentityfeaturewithchemicalsubclasscomposition