Cargando…
Wide-coverage relation extraction from MEDLINE using deep syntax
BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396593/ https://www.ncbi.nlm.nih.gov/pubmed/25887686 http://dx.doi.org/10.1186/s12859-015-0538-8 |
_version_ | 1782366608377774080 |
---|---|
author | Nguyen, Nhung TH Miwa, Makoto Tsuruoka, Yoshimasa Chikayama, Takashi Tojo, Satoshi |
author_facet | Nguyen, Nhung TH Miwa, Makoto Tsuruoka, Yoshimasa Chikayama, Takashi Tojo, Satoshi |
author_sort | Nguyen, Nhung TH |
collection | PubMed |
description | BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0538-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4396593 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43965932015-04-15 Wide-coverage relation extraction from MEDLINE using deep syntax Nguyen, Nhung TH Miwa, Makoto Tsuruoka, Yoshimasa Chikayama, Takashi Tojo, Satoshi BMC Bioinformatics Research Article BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0538-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-01 /pmc/articles/PMC4396593/ /pubmed/25887686 http://dx.doi.org/10.1186/s12859-015-0538-8 Text en © Nguyen et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Nguyen, Nhung TH Miwa, Makoto Tsuruoka, Yoshimasa Chikayama, Takashi Tojo, Satoshi Wide-coverage relation extraction from MEDLINE using deep syntax |
title | Wide-coverage relation extraction from MEDLINE using deep syntax |
title_full | Wide-coverage relation extraction from MEDLINE using deep syntax |
title_fullStr | Wide-coverage relation extraction from MEDLINE using deep syntax |
title_full_unstemmed | Wide-coverage relation extraction from MEDLINE using deep syntax |
title_short | Wide-coverage relation extraction from MEDLINE using deep syntax |
title_sort | wide-coverage relation extraction from medline using deep syntax |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396593/ https://www.ncbi.nlm.nih.gov/pubmed/25887686 http://dx.doi.org/10.1186/s12859-015-0538-8 |
work_keys_str_mv | AT nguyennhungth widecoveragerelationextractionfrommedlineusingdeepsyntax AT miwamakoto widecoveragerelationextractionfrommedlineusingdeepsyntax AT tsuruokayoshimasa widecoveragerelationextractionfrommedlineusingdeepsyntax AT chikayamatakashi widecoveragerelationextractionfrommedlineusingdeepsyntax AT tojosatoshi widecoveragerelationextractionfrommedlineusingdeepsyntax |