Cargando…
An automated framework for hypotheses generation using literature
BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a di...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3497588/ https://www.ncbi.nlm.nih.gov/pubmed/22931688 http://dx.doi.org/10.1186/1756-0381-5-13 |
_version_ | 1782249751729668096 |
---|---|
author | Abedi, Vida Zand, Ramin Yeasin, Mohammed Faisal, Fazle Elahi |
author_facet | Abedi, Vida Zand, Ramin Yeasin, Mohammed Faisal, Fazle Elahi |
author_sort | Abedi, Vida |
collection | PubMed |
description | BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds “crisp semantic associations” among entities of interest - that is a step towards bridging such gaps. METHODOLOGY: The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect “crisp” associations, and making assertions about entities (such as disease X is associated with a set of factors Z). RESULTS: Pilot studies were performed using two diseases. A comparative analysis of the computed “associations” and “assertions” with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture “crisp” direct and indirect associations, and provide knowledge discovery on demand. CONCLUSIONS: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF. |
format | Online Article Text |
id | pubmed-3497588 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34975882012-11-20 An automated framework for hypotheses generation using literature Abedi, Vida Zand, Ramin Yeasin, Mohammed Faisal, Fazle Elahi BioData Min Research BACKGROUND: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds “crisp semantic associations” among entities of interest - that is a step towards bridging such gaps. METHODOLOGY: The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect “crisp” associations, and making assertions about entities (such as disease X is associated with a set of factors Z). RESULTS: Pilot studies were performed using two diseases. A comparative analysis of the computed “associations” and “assertions” with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture “crisp” direct and indirect associations, and provide knowledge discovery on demand. CONCLUSIONS: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF. BioMed Central 2012-08-29 /pmc/articles/PMC3497588/ /pubmed/22931688 http://dx.doi.org/10.1186/1756-0381-5-13 Text en Copyright ©2012 Abedi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Abedi, Vida Zand, Ramin Yeasin, Mohammed Faisal, Fazle Elahi An automated framework for hypotheses generation using literature |
title | An automated framework for hypotheses generation using literature |
title_full | An automated framework for hypotheses generation using literature |
title_fullStr | An automated framework for hypotheses generation using literature |
title_full_unstemmed | An automated framework for hypotheses generation using literature |
title_short | An automated framework for hypotheses generation using literature |
title_sort | automated framework for hypotheses generation using literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3497588/ https://www.ncbi.nlm.nih.gov/pubmed/22931688 http://dx.doi.org/10.1186/1756-0381-5-13 |
work_keys_str_mv | AT abedivida anautomatedframeworkforhypothesesgenerationusingliterature AT zandramin anautomatedframeworkforhypothesesgenerationusingliterature AT yeasinmohammed anautomatedframeworkforhypothesesgenerationusingliterature AT faisalfazleelahi anautomatedframeworkforhypothesesgenerationusingliterature AT abedivida automatedframeworkforhypothesesgenerationusingliterature AT zandramin automatedframeworkforhypothesesgenerationusingliterature AT yeasinmohammed automatedframeworkforhypothesesgenerationusingliterature AT faisalfazleelahi automatedframeworkforhypothesesgenerationusingliterature |