Cargando…
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure
Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been valida...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4992544/ https://www.ncbi.nlm.nih.gov/pubmed/27579031 http://dx.doi.org/10.1155/2016/1096271 |
_version_ | 1782449026059206656 |
---|---|
author | Zhang, Wen Xiao, Fan Li, Bin Zhang, Siguang |
author_facet | Zhang, Wen Xiao, Fan Li, Bin Zhang, Siguang |
author_sort | Zhang, Wen |
collection | PubMed |
description | Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods. |
format | Online Article Text |
id | pubmed-4992544 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-49925442016-08-30 Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure Zhang, Wen Xiao, Fan Li, Bin Zhang, Siguang Comput Intell Neurosci Research Article Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods. Hindawi Publishing Corporation 2016 2016-08-07 /pmc/articles/PMC4992544/ /pubmed/27579031 http://dx.doi.org/10.1155/2016/1096271 Text en Copyright © 2016 Wen Zhang et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhang, Wen Xiao, Fan Li, Bin Zhang, Siguang Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title | Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title_full | Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title_fullStr | Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title_full_unstemmed | Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title_short | Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure |
title_sort | using svd on clusters to improve precision of interdocument similarity measure |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4992544/ https://www.ncbi.nlm.nih.gov/pubmed/27579031 http://dx.doi.org/10.1155/2016/1096271 |
work_keys_str_mv | AT zhangwen usingsvdonclusterstoimproveprecisionofinterdocumentsimilaritymeasure AT xiaofan usingsvdonclusterstoimproveprecisionofinterdocumentsimilaritymeasure AT libin usingsvdonclusterstoimproveprecisionofinterdocumentsimilaritymeasure AT zhangsiguang usingsvdonclusterstoimproveprecisionofinterdocumentsimilaritymeasure |