Cargando…

Full Text Clustering and Relationship Network Analysis of Biomedical Publications

Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Renchu, Yang, Chen, Marchese, Maurizio, Liang, Yanchun, Shi, Xiaohu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177555/
https://www.ncbi.nlm.nih.gov/pubmed/25250864
http://dx.doi.org/10.1371/journal.pone.0108847
_version_ 1782336782117896192
author Guan, Renchu
Yang, Chen
Marchese, Maurizio
Liang, Yanchun
Shi, Xiaohu
author_facet Guan, Renchu
Yang, Chen
Marchese, Maurizio
Liang, Yanchun
Shi, Xiaohu
author_sort Guan, Renchu
collection PubMed
description Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.
format Online
Article
Text
id pubmed-4177555
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41775552014-10-02 Full Text Clustering and Relationship Network Analysis of Biomedical Publications Guan, Renchu Yang, Chen Marchese, Maurizio Liang, Yanchun Shi, Xiaohu PLoS One Research Article Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers. Public Library of Science 2014-09-24 /pmc/articles/PMC4177555/ /pubmed/25250864 http://dx.doi.org/10.1371/journal.pone.0108847 Text en © 2014 Guan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Guan, Renchu
Yang, Chen
Marchese, Maurizio
Liang, Yanchun
Shi, Xiaohu
Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title_full Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title_fullStr Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title_full_unstemmed Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title_short Full Text Clustering and Relationship Network Analysis of Biomedical Publications
title_sort full text clustering and relationship network analysis of biomedical publications
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177555/
https://www.ncbi.nlm.nih.gov/pubmed/25250864
http://dx.doi.org/10.1371/journal.pone.0108847
work_keys_str_mv AT guanrenchu fulltextclusteringandrelationshipnetworkanalysisofbiomedicalpublications
AT yangchen fulltextclusteringandrelationshipnetworkanalysisofbiomedicalpublications
AT marchesemaurizio fulltextclusteringandrelationshipnetworkanalysisofbiomedicalpublications
AT liangyanchun fulltextclusteringandrelationshipnetworkanalysisofbiomedicalpublications
AT shixiaohu fulltextclusteringandrelationshipnetworkanalysisofbiomedicalpublications