Cargando…

An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature

The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation ext...

Descripción completa

Detalles Bibliográficos
Autores principales: Quan, Changqin, Wang, Meng, Ren, Fuji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103846/
https://www.ncbi.nlm.nih.gov/pubmed/25036529
http://dx.doi.org/10.1371/journal.pone.0102039
_version_ 1782327204445683712
author Quan, Changqin
Wang, Meng
Ren, Fuji
author_facet Quan, Changqin
Wang, Meng
Ren, Fuji
author_sort Quan, Changqin
collection PubMed
description The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.
format Online
Article
Text
id pubmed-4103846
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41038462014-07-21 An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature Quan, Changqin Wang, Meng Ren, Fuji PLoS One Research Article The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method. Public Library of Science 2014-07-18 /pmc/articles/PMC4103846/ /pubmed/25036529 http://dx.doi.org/10.1371/journal.pone.0102039 Text en © 2014 Quan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Quan, Changqin
Wang, Meng
Ren, Fuji
An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title_full An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title_fullStr An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title_full_unstemmed An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title_short An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
title_sort unsupervised text mining method for relation extraction from biomedical literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103846/
https://www.ncbi.nlm.nih.gov/pubmed/25036529
http://dx.doi.org/10.1371/journal.pone.0102039
work_keys_str_mv AT quanchangqin anunsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature
AT wangmeng anunsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature
AT renfuji anunsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature
AT quanchangqin unsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature
AT wangmeng unsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature
AT renfuji unsupervisedtextminingmethodforrelationextractionfrombiomedicalliterature