Cargando…

An automated framework for hypotheses generation using literature

BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a di...

Descripción completa

Detalles Bibliográficos
Autores principales: Abedi, Vida, Zand, Ramin, Yeasin, Mohammed, Faisal, Fazle Elahi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3497588/
https://www.ncbi.nlm.nih.gov/pubmed/22931688
http://dx.doi.org/10.1186/1756-0381-5-13
_version_ 1782249751729668096
author Abedi, Vida
Zand, Ramin
Yeasin, Mohammed
Faisal, Fazle Elahi
author_facet Abedi, Vida
Zand, Ramin
Yeasin, Mohammed
Faisal, Fazle Elahi
author_sort Abedi, Vida
collection PubMed
description BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds “crisp semantic associations” among entities of interest - that is a step towards bridging such gaps. METHODOLOGY: The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect “crisp” associations, and making assertions about entities (such as disease X is associated with a set of factors Z). RESULTS: Pilot studies were performed using two diseases. A comparative analysis of the computed “associations” and “assertions” with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture “crisp” direct and indirect associations, and provide knowledge discovery on demand. CONCLUSIONS: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
format Online
Article
Text
id pubmed-3497588
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34975882012-11-20 An automated framework for hypotheses generation using literature Abedi, Vida Zand, Ramin Yeasin, Mohammed Faisal, Fazle Elahi BioData Min Research BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds “crisp semantic associations” among entities of interest - that is a step towards bridging such gaps. METHODOLOGY: The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect “crisp” associations, and making assertions about entities (such as disease X is associated with a set of factors Z). RESULTS: Pilot studies were performed using two diseases. A comparative analysis of the computed “associations” and “assertions” with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture “crisp” direct and indirect associations, and provide knowledge discovery on demand. CONCLUSIONS: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF. BioMed Central 2012-08-29 /pmc/articles/PMC3497588/ /pubmed/22931688 http://dx.doi.org/10.1186/1756-0381-5-13 Text en Copyright ©2012 Abedi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Abedi, Vida
Zand, Ramin
Yeasin, Mohammed
Faisal, Fazle Elahi
An automated framework for hypotheses generation using literature
title An automated framework for hypotheses generation using literature
title_full An automated framework for hypotheses generation using literature
title_fullStr An automated framework for hypotheses generation using literature
title_full_unstemmed An automated framework for hypotheses generation using literature
title_short An automated framework for hypotheses generation using literature
title_sort automated framework for hypotheses generation using literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3497588/
https://www.ncbi.nlm.nih.gov/pubmed/22931688
http://dx.doi.org/10.1186/1756-0381-5-13
work_keys_str_mv AT abedivida anautomatedframeworkforhypothesesgenerationusingliterature
AT zandramin anautomatedframeworkforhypothesesgenerationusingliterature
AT yeasinmohammed anautomatedframeworkforhypothesesgenerationusingliterature
AT faisalfazleelahi anautomatedframeworkforhypothesesgenerationusingliterature
AT abedivida automatedframeworkforhypothesesgenerationusingliterature
AT zandramin automatedframeworkforhypothesesgenerationusingliterature
AT yeasinmohammed automatedframeworkforhypothesesgenerationusingliterature
AT faisalfazleelahi automatedframeworkforhypothesesgenerationusingliterature