Cargando…

Clustering cliques for graph-based summarization of the biomedical research literature

BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is use...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Han, Fiszman, Marcelo, Shin, Dongwook, Wilkowski, Bartlomiej, Rindflesch, Thomas C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3682874/
https://www.ncbi.nlm.nih.gov/pubmed/23742159
http://dx.doi.org/10.1186/1471-2105-14-182
_version_ 1782273414166216704
author Zhang, Han
Fiszman, Marcelo
Shin, Dongwook
Wilkowski, Bartlomiej
Rindflesch, Thomas C
author_facet Zhang, Han
Fiszman, Marcelo
Shin, Dongwook
Wilkowski, Bartlomiej
Rindflesch, Thomas C
author_sort Zhang, Han
collection PubMed
description BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
format Online
Article
Text
id pubmed-3682874
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36828742013-06-25 Clustering cliques for graph-based summarization of the biomedical research literature Zhang, Han Fiszman, Marcelo Shin, Dongwook Wilkowski, Bartlomiej Rindflesch, Thomas C BMC Bioinformatics Research Article BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively. BioMed Central 2013-06-07 /pmc/articles/PMC3682874/ /pubmed/23742159 http://dx.doi.org/10.1186/1471-2105-14-182 Text en Copyright © 2013 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Han
Fiszman, Marcelo
Shin, Dongwook
Wilkowski, Bartlomiej
Rindflesch, Thomas C
Clustering cliques for graph-based summarization of the biomedical research literature
title Clustering cliques for graph-based summarization of the biomedical research literature
title_full Clustering cliques for graph-based summarization of the biomedical research literature
title_fullStr Clustering cliques for graph-based summarization of the biomedical research literature
title_full_unstemmed Clustering cliques for graph-based summarization of the biomedical research literature
title_short Clustering cliques for graph-based summarization of the biomedical research literature
title_sort clustering cliques for graph-based summarization of the biomedical research literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3682874/
https://www.ncbi.nlm.nih.gov/pubmed/23742159
http://dx.doi.org/10.1186/1471-2105-14-182
work_keys_str_mv AT zhanghan clusteringcliquesforgraphbasedsummarizationofthebiomedicalresearchliterature
AT fiszmanmarcelo clusteringcliquesforgraphbasedsummarizationofthebiomedicalresearchliterature
AT shindongwook clusteringcliquesforgraphbasedsummarizationofthebiomedicalresearchliterature
AT wilkowskibartlomiej clusteringcliquesforgraphbasedsummarizationofthebiomedicalresearchliterature
AT rindfleschthomasc clusteringcliquesforgraphbasedsummarizationofthebiomedicalresearchliterature