Cargando…

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research

BACKGROUND: Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Bravo, Àlex, Piñero, Janet, Queralt-Rosinach, Núria, Rautschka, Michael, Furlong, Laura I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466840/
https://www.ncbi.nlm.nih.gov/pubmed/25886734
http://dx.doi.org/10.1186/s12859-015-0472-9
_version_ 1782376291757981696
author Bravo, Àlex
Piñero, Janet
Queralt-Rosinach, Núria
Rautschka, Michael
Furlong, Laura I
author_facet Bravo, Àlex
Piñero, Janet
Queralt-Rosinach, Núria
Rautschka, Michael
Furlong, Laura I
author_sort Bravo, Àlex
collection PubMed
description BACKGROUND: Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. RESULTS: By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. CONCLUSIONS: BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4466840
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44668402015-06-16 Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research Bravo, Àlex Piñero, Janet Queralt-Rosinach, Núria Rautschka, Michael Furlong, Laura I BMC Bioinformatics Methodology Article BACKGROUND: Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. RESULTS: By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. CONCLUSIONS: BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-21 /pmc/articles/PMC4466840/ /pubmed/25886734 http://dx.doi.org/10.1186/s12859-015-0472-9 Text en © Bravo et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bravo, Àlex
Piñero, Janet
Queralt-Rosinach, Núria
Rautschka, Michael
Furlong, Laura I
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title_full Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title_fullStr Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title_full_unstemmed Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title_short Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
title_sort extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466840/
https://www.ncbi.nlm.nih.gov/pubmed/25886734
http://dx.doi.org/10.1186/s12859-015-0472-9
work_keys_str_mv AT bravoalex extractionofrelationsbetweengenesanddiseasesfromtextandlargescaledataanalysisimplicationsfortranslationalresearch
AT pinerojanet extractionofrelationsbetweengenesanddiseasesfromtextandlargescaledataanalysisimplicationsfortranslationalresearch
AT queraltrosinachnuria extractionofrelationsbetweengenesanddiseasesfromtextandlargescaledataanalysisimplicationsfortranslationalresearch
AT rautschkamichael extractionofrelationsbetweengenesanddiseasesfromtextandlargescaledataanalysisimplicationsfortranslationalresearch
AT furlonglaurai extractionofrelationsbetweengenesanddiseasesfromtextandlargescaledataanalysisimplicationsfortranslationalresearch