Cargando…

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics

The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Dejian, Wang, Wanru, Zhang, Shuai, Zhang, Wenyu, Liu, Rongyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5659815/ https://www.ncbi.nlm.nih.gov/pubmed/29077747 http://dx.doi.org/10.1371/journal.pone.0187164

_version_	1783274212346036224
author	Yu, Dejian Wang, Wanru Zhang, Shuai Zhang, Wenyu Liu, Rongyu
author_facet	Yu, Dejian Wang, Wanru Zhang, Shuai Zhang, Wenyu Liu, Rongyu
author_sort	Yu, Dejian
collection	PubMed
description	The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify “core documents”. First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.
format	Online Article Text
id	pubmed-5659815
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-56598152017-11-09 Hybrid self-optimized clustering model based on citation links and textual features to detect research topics Yu, Dejian Wang, Wanru Zhang, Shuai Zhang, Wenyu Liu, Rongyu PLoS One Research Article The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify “core documents”. First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers. Public Library of Science 2017-10-27 /pmc/articles/PMC5659815/ /pubmed/29077747 http://dx.doi.org/10.1371/journal.pone.0187164 Text en © 2017 Yu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Yu, Dejian Wang, Wanru Zhang, Shuai Zhang, Wenyu Liu, Rongyu Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title	Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title_full	Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title_fullStr	Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title_full_unstemmed	Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title_short	Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
title_sort	hybrid self-optimized clustering model based on citation links and textual features to detect research topics
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5659815/ https://www.ncbi.nlm.nih.gov/pubmed/29077747 http://dx.doi.org/10.1371/journal.pone.0187164
work_keys_str_mv	AT yudejian hybridselfoptimizedclusteringmodelbasedoncitationlinksandtextualfeaturestodetectresearchtopics AT wangwanru hybridselfoptimizedclusteringmodelbasedoncitationlinksandtextualfeaturestodetectresearchtopics AT zhangshuai hybridselfoptimizedclusteringmodelbasedoncitationlinksandtextualfeaturestodetectresearchtopics AT zhangwenyu hybridselfoptimizedclusteringmodelbasedoncitationlinksandtextualfeaturestodetectresearchtopics AT liurongyu hybridselfoptimizedclusteringmodelbasedoncitationlinksandtextualfeaturestodetectresearchtopics

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics

Ejemplares similares