Cargando…
Chemical-induced disease relation extraction with various linguistic features
Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822558/ https://www.ncbi.nlm.nih.gov/pubmed/27052618 http://dx.doi.org/10.1093/database/baw042 |
_version_ | 1782425789140041728 |
---|---|
author | Gu, Jinghang Qian, Longhua Zhou, Guodong |
author_facet | Gu, Jinghang Qian, Longhua Zhou, Guodong |
author_sort | Gu, Jinghang |
collection | PubMed |
description | Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- and inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promising F-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively. Database URL: https://github.com/JHnlp/BC5CIDTask |
format | Online Article Text |
id | pubmed-4822558 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-48225582016-04-07 Chemical-induced disease relation extraction with various linguistic features Gu, Jinghang Qian, Longhua Zhou, Guodong Database (Oxford) Original Article Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- and inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promising F-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively. Database URL: https://github.com/JHnlp/BC5CIDTask Oxford University Press 2016-04-06 /pmc/articles/PMC4822558/ /pubmed/27052618 http://dx.doi.org/10.1093/database/baw042 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Gu, Jinghang Qian, Longhua Zhou, Guodong Chemical-induced disease relation extraction with various linguistic features |
title | Chemical-induced disease relation extraction with various linguistic features |
title_full | Chemical-induced disease relation extraction with various linguistic features |
title_fullStr | Chemical-induced disease relation extraction with various linguistic features |
title_full_unstemmed | Chemical-induced disease relation extraction with various linguistic features |
title_short | Chemical-induced disease relation extraction with various linguistic features |
title_sort | chemical-induced disease relation extraction with various linguistic features |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822558/ https://www.ncbi.nlm.nih.gov/pubmed/27052618 http://dx.doi.org/10.1093/database/baw042 |
work_keys_str_mv | AT gujinghang chemicalinduceddiseaserelationextractionwithvariouslinguisticfeatures AT qianlonghua chemicalinduceddiseaserelationextractionwithvariouslinguisticfeatures AT zhouguodong chemicalinduceddiseaserelationextractionwithvariouslinguisticfeatures |