Cargando…

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction

The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCr...

Descripción completa

Detalles Bibliográficos
Autores principales: Le, Hoang-Quynh, Tran, Mai-Vu, Dang, Thanh Hai, Ha, Quang-Thuy, Collier, Nigel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4962668/
https://www.ncbi.nlm.nih.gov/pubmed/27630201
http://dx.doi.org/10.1093/database/baw102
_version_ 1782444867563028480
author Le, Hoang-Quynh
Tran, Mai-Vu
Dang, Thanh Hai
Ha, Quang-Thuy
Collier, Nigel
author_facet Le, Hoang-Quynh
Tran, Mai-Vu
Dang, Thanh Hai
Ha, Quang-Thuy
Collier, Nigel
author_sort Le, Hoang-Quynh
collection PubMed
description The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system’s performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a ‘silver’ CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%). Database URL: SilverCID–The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530).
format Online
Article
Text
id pubmed-4962668
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49626682016-07-28 Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction Le, Hoang-Quynh Tran, Mai-Vu Dang, Thanh Hai Ha, Quang-Thuy Collier, Nigel Database (Oxford) Original Article The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system’s performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a ‘silver’ CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%). Database URL: SilverCID–The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530). Oxford University Press 2016-07-26 /pmc/articles/PMC4962668/ /pubmed/27630201 http://dx.doi.org/10.1093/database/baw102 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Le, Hoang-Quynh
Tran, Mai-Vu
Dang, Thanh Hai
Ha, Quang-Thuy
Collier, Nigel
Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title_full Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title_fullStr Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title_full_unstemmed Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title_short Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
title_sort sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4962668/
https://www.ncbi.nlm.nih.gov/pubmed/27630201
http://dx.doi.org/10.1093/database/baw102
work_keys_str_mv AT lehoangquynh sievebasedcoreferenceresolutionenhancessemisupervisedlearningmodelforchemicalinduceddiseaserelationextraction
AT tranmaivu sievebasedcoreferenceresolutionenhancessemisupervisedlearningmodelforchemicalinduceddiseaserelationextraction
AT dangthanhhai sievebasedcoreferenceresolutionenhancessemisupervisedlearningmodelforchemicalinduceddiseaserelationextraction
AT haquangthuy sievebasedcoreferenceresolutionenhancessemisupervisedlearningmodelforchemicalinduceddiseaserelationextraction
AT colliernigel sievebasedcoreferenceresolutionenhancessemisupervisedlearningmodelforchemicalinduceddiseaserelationextraction