Cargando…

Econo-ESA in semantic text similarity

Explicit semantic analysis (ESA) utilizes an immense Wikipedia index matrix in its interpreter part. This part of the analysis multiplies a large matrix by a term vector to produce a high-dimensional concept vector. A similarity measurement between two texts is performed between two concept vectors...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahutomo, Faisal, Aritsugi, Masayoshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4003000/
https://www.ncbi.nlm.nih.gov/pubmed/24790807
http://dx.doi.org/10.1186/2193-1801-3-149
_version_ 1782313822275502080
author Rahutomo, Faisal
Aritsugi, Masayoshi
author_facet Rahutomo, Faisal
Aritsugi, Masayoshi
author_sort Rahutomo, Faisal
collection PubMed
description Explicit semantic analysis (ESA) utilizes an immense Wikipedia index matrix in its interpreter part. This part of the analysis multiplies a large matrix by a term vector to produce a high-dimensional concept vector. A similarity measurement between two texts is performed between two concept vectors with numerous dimensions. The cost is expensive in both interpretation and similarity measurement steps. This paper proposes an economic scheme of ESA, named econo-ESA. We investigate two aspects of this proposal: dimensional reduction and experiments with various data. We use eight recycling test collections in semantic text similarity. The experimental results show that both the dimensional reduction and test collection characteristics can influence the results. They also show that an appropriate concept reduction of econo-ESA can decrease the cost with minor differences in the results from the original ESA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/2193-1801-3-149) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4003000
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-40030002014-04-30 Econo-ESA in semantic text similarity Rahutomo, Faisal Aritsugi, Masayoshi Springerplus Research Explicit semantic analysis (ESA) utilizes an immense Wikipedia index matrix in its interpreter part. This part of the analysis multiplies a large matrix by a term vector to produce a high-dimensional concept vector. A similarity measurement between two texts is performed between two concept vectors with numerous dimensions. The cost is expensive in both interpretation and similarity measurement steps. This paper proposes an economic scheme of ESA, named econo-ESA. We investigate two aspects of this proposal: dimensional reduction and experiments with various data. We use eight recycling test collections in semantic text similarity. The experimental results show that both the dimensional reduction and test collection characteristics can influence the results. They also show that an appropriate concept reduction of econo-ESA can decrease the cost with minor differences in the results from the original ESA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/2193-1801-3-149) contains supplementary material, which is available to authorized users. Springer International Publishing 2014-03-19 /pmc/articles/PMC4003000/ /pubmed/24790807 http://dx.doi.org/10.1186/2193-1801-3-149 Text en © Rahutomo and Aritsugi; licensee Springer. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Rahutomo, Faisal
Aritsugi, Masayoshi
Econo-ESA in semantic text similarity
title Econo-ESA in semantic text similarity
title_full Econo-ESA in semantic text similarity
title_fullStr Econo-ESA in semantic text similarity
title_full_unstemmed Econo-ESA in semantic text similarity
title_short Econo-ESA in semantic text similarity
title_sort econo-esa in semantic text similarity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4003000/
https://www.ncbi.nlm.nih.gov/pubmed/24790807
http://dx.doi.org/10.1186/2193-1801-3-149
work_keys_str_mv AT rahutomofaisal econoesainsemantictextsimilarity
AT aritsugimasayoshi econoesainsemantictextsimilarity