Cargando…

Wide-coverage relation extraction from MEDLINE using deep syntax

BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Nhung TH, Miwa, Makoto, Tsuruoka, Yoshimasa, Chikayama, Takashi, Tojo, Satoshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396593/
https://www.ncbi.nlm.nih.gov/pubmed/25887686
http://dx.doi.org/10.1186/s12859-015-0538-8
_version_ 1782366608377774080
author Nguyen, Nhung TH
Miwa, Makoto
Tsuruoka, Yoshimasa
Chikayama, Takashi
Tojo, Satoshi
author_facet Nguyen, Nhung TH
Miwa, Makoto
Tsuruoka, Yoshimasa
Chikayama, Takashi
Tojo, Satoshi
author_sort Nguyen, Nhung TH
collection PubMed
description BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0538-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4396593
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43965932015-04-15 Wide-coverage relation extraction from MEDLINE using deep syntax Nguyen, Nhung TH Miwa, Makoto Tsuruoka, Yoshimasa Chikayama, Takashi Tojo, Satoshi BMC Bioinformatics Research Article BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0538-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-01 /pmc/articles/PMC4396593/ /pubmed/25887686 http://dx.doi.org/10.1186/s12859-015-0538-8 Text en © Nguyen et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Nguyen, Nhung TH
Miwa, Makoto
Tsuruoka, Yoshimasa
Chikayama, Takashi
Tojo, Satoshi
Wide-coverage relation extraction from MEDLINE using deep syntax
title Wide-coverage relation extraction from MEDLINE using deep syntax
title_full Wide-coverage relation extraction from MEDLINE using deep syntax
title_fullStr Wide-coverage relation extraction from MEDLINE using deep syntax
title_full_unstemmed Wide-coverage relation extraction from MEDLINE using deep syntax
title_short Wide-coverage relation extraction from MEDLINE using deep syntax
title_sort wide-coverage relation extraction from medline using deep syntax
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396593/
https://www.ncbi.nlm.nih.gov/pubmed/25887686
http://dx.doi.org/10.1186/s12859-015-0538-8
work_keys_str_mv AT nguyennhungth widecoveragerelationextractionfrommedlineusingdeepsyntax
AT miwamakoto widecoveragerelationextractionfrommedlineusingdeepsyntax
AT tsuruokayoshimasa widecoveragerelationextractionfrommedlineusingdeepsyntax
AT chikayamatakashi widecoveragerelationextractionfrommedlineusingdeepsyntax
AT tojosatoshi widecoveragerelationextractionfrommedlineusingdeepsyntax