Cargando…
Sieve-based relation extraction of gene regulatory networks from biological literature
BACKGROUND: Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of rela...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642041/ https://www.ncbi.nlm.nih.gov/pubmed/26551454 http://dx.doi.org/10.1186/1471-2105-16-S16-S1 |
_version_ | 1782400292588879872 |
---|---|
author | Žitnik, Slavko Žitnik, Marinka Zupan, Blaž Bajec, Marko |
author_facet | Žitnik, Slavko Žitnik, Marinka Zupan, Blaž Bajec, Marko |
author_sort | Žitnik, Slavko |
collection | PubMed |
description | BACKGROUND: Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. RESULTS: We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. CONCLUSIONS: Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. |
format | Online Article Text |
id | pubmed-4642041 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46420412015-11-19 Sieve-based relation extraction of gene regulatory networks from biological literature Žitnik, Slavko Žitnik, Marinka Zupan, Blaž Bajec, Marko BMC Bioinformatics Research BACKGROUND: Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. RESULTS: We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. CONCLUSIONS: Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. BioMed Central 2015-10-30 /pmc/articles/PMC4642041/ /pubmed/26551454 http://dx.doi.org/10.1186/1471-2105-16-S16-S1 Text en Copyright © 2015 Žitnik et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Žitnik, Slavko Žitnik, Marinka Zupan, Blaž Bajec, Marko Sieve-based relation extraction of gene regulatory networks from biological literature |
title | Sieve-based relation extraction of gene regulatory networks from biological literature |
title_full | Sieve-based relation extraction of gene regulatory networks from biological literature |
title_fullStr | Sieve-based relation extraction of gene regulatory networks from biological literature |
title_full_unstemmed | Sieve-based relation extraction of gene regulatory networks from biological literature |
title_short | Sieve-based relation extraction of gene regulatory networks from biological literature |
title_sort | sieve-based relation extraction of gene regulatory networks from biological literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642041/ https://www.ncbi.nlm.nih.gov/pubmed/26551454 http://dx.doi.org/10.1186/1471-2105-16-S16-S1 |
work_keys_str_mv | AT zitnikslavko sievebasedrelationextractionofgeneregulatorynetworksfrombiologicalliterature AT zitnikmarinka sievebasedrelationextractionofgeneregulatorynetworksfrombiologicalliterature AT zupanblaz sievebasedrelationextractionofgeneregulatorynetworksfrombiologicalliterature AT bajecmarko sievebasedrelationextractionofgeneregulatorynetworksfrombiologicalliterature |