Cargando…

Connecting the Dots between PubMed Abstracts

BACKGROUND: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and diseases. Each article investigates subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must integrate in...

Descripción completa

Detalles Bibliográficos
Autores principales: Hossain, M. Shahriar, Gresock, Joseph, Edmonds, Yvette, Helm, Richard, Potts, Malcolm, Ramakrishnan, Naren
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250456/
https://www.ncbi.nlm.nih.gov/pubmed/22235301
http://dx.doi.org/10.1371/journal.pone.0029509
_version_ 1782220470246965248
author Hossain, M. Shahriar
Gresock, Joseph
Edmonds, Yvette
Helm, Richard
Potts, Malcolm
Ramakrishnan, Naren
author_facet Hossain, M. Shahriar
Gresock, Joseph
Edmonds, Yvette
Helm, Richard
Potts, Malcolm
Ramakrishnan, Naren
author_sort Hossain, M. Shahriar
collection PubMed
description BACKGROUND: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and diseases. Each article investigates subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must integrate information from multiple publications. Particularly, unraveling relationships between extra-cellular inputs and downstream molecular response mechanisms requires integrating conclusions from diverse publications. METHODOLOGY: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for “connecting the dots” across the literature. We describe a storytelling algorithm that, given a start and end publication, typically with little or no overlap in content, identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. The quality of discovered stories is measured using local criteria such as the size of supporting neighborhoods for each link and the strength of individual links connecting publications, as well as global metrics of dispersion. To ensure that the story stays coherent as it meanders from one publication to another, we demonstrate the design of novel coherence and overlap filters for use as post-processing steps. CONCLUSIONS: We demonstrate the application of our storytelling algorithm to three case studies: i) a many-one study exploring relationships between multiple cellular inputs and a molecule responsible for cell-fate decisions, ii) a many-many study exploring the relationships between multiple cytokines and multiple downstream transcription factors, and iii) a one-to-one study to showcase the ability to recover a cancer related association, viz. the Warburg effect, from past literature. The storytelling pipeline helps narrow down a scientist's focus from several hundreds of thousands of relevant documents to only around a hundred stories. We argue that our approach can serve as a valuable discovery aid for hypothesis generation and connection exploration in large unstructured biological knowledge bases.
format Online
Article
Text
id pubmed-3250456
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32504562012-01-10 Connecting the Dots between PubMed Abstracts Hossain, M. Shahriar Gresock, Joseph Edmonds, Yvette Helm, Richard Potts, Malcolm Ramakrishnan, Naren PLoS One Research Article BACKGROUND: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and diseases. Each article investigates subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must integrate information from multiple publications. Particularly, unraveling relationships between extra-cellular inputs and downstream molecular response mechanisms requires integrating conclusions from diverse publications. METHODOLOGY: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for “connecting the dots” across the literature. We describe a storytelling algorithm that, given a start and end publication, typically with little or no overlap in content, identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. The quality of discovered stories is measured using local criteria such as the size of supporting neighborhoods for each link and the strength of individual links connecting publications, as well as global metrics of dispersion. To ensure that the story stays coherent as it meanders from one publication to another, we demonstrate the design of novel coherence and overlap filters for use as post-processing steps. CONCLUSIONS: We demonstrate the application of our storytelling algorithm to three case studies: i) a many-one study exploring relationships between multiple cellular inputs and a molecule responsible for cell-fate decisions, ii) a many-many study exploring the relationships between multiple cytokines and multiple downstream transcription factors, and iii) a one-to-one study to showcase the ability to recover a cancer related association, viz. the Warburg effect, from past literature. The storytelling pipeline helps narrow down a scientist's focus from several hundreds of thousands of relevant documents to only around a hundred stories. We argue that our approach can serve as a valuable discovery aid for hypothesis generation and connection exploration in large unstructured biological knowledge bases. Public Library of Science 2012-01-03 /pmc/articles/PMC3250456/ /pubmed/22235301 http://dx.doi.org/10.1371/journal.pone.0029509 Text en Hossain et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Hossain, M. Shahriar
Gresock, Joseph
Edmonds, Yvette
Helm, Richard
Potts, Malcolm
Ramakrishnan, Naren
Connecting the Dots between PubMed Abstracts
title Connecting the Dots between PubMed Abstracts
title_full Connecting the Dots between PubMed Abstracts
title_fullStr Connecting the Dots between PubMed Abstracts
title_full_unstemmed Connecting the Dots between PubMed Abstracts
title_short Connecting the Dots between PubMed Abstracts
title_sort connecting the dots between pubmed abstracts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250456/
https://www.ncbi.nlm.nih.gov/pubmed/22235301
http://dx.doi.org/10.1371/journal.pone.0029509
work_keys_str_mv AT hossainmshahriar connectingthedotsbetweenpubmedabstracts
AT gresockjoseph connectingthedotsbetweenpubmedabstracts
AT edmondsyvette connectingthedotsbetweenpubmedabstracts
AT helmrichard connectingthedotsbetweenpubmedabstracts
AT pottsmalcolm connectingthedotsbetweenpubmedabstracts
AT ramakrishnannaren connectingthedotsbetweenpubmedabstracts