Cargando…

#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs

Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a...

Descripción completa

Detalles Bibliográficos
Autor principal:	Schedl, Markus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2012
Materias:	Information Retrieval for Social Media
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4008152/ https://www.ncbi.nlm.nih.gov/pubmed/24817824 http://dx.doi.org/10.1007/s10791-012-9187-y

_version_	1782314408394883072
author	Schedl, Markus
author_facet	Schedl, Markus
author_sort	Schedl, Markus
collection	PubMed
description	Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com . For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb . We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.
format	Online Article Text
id	pubmed-4008152
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-40081522014-05-07 #nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs Schedl, Markus Inf Retr Boston Information Retrieval for Social Media Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com . For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb . We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source. Springer Netherlands 2012-03-06 2012 /pmc/articles/PMC4008152/ /pubmed/24817824 http://dx.doi.org/10.1007/s10791-012-9187-y Text en © The Author(s) 2012 https://creativecommons.org/licenses/by/4.0/ This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle	Information Retrieval for Social Media Schedl, Markus #nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title	#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title_full	#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title_fullStr	#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title_full_unstemmed	#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title_short	#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
title_sort	#nowplaying madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs
topic	Information Retrieval for Social Media
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4008152/ https://www.ncbi.nlm.nih.gov/pubmed/24817824 http://dx.doi.org/10.1007/s10791-012-9187-y
work_keys_str_mv	AT schedlmarkus nowplayingmadonnaalargescaleevaluationonestimatingsimilaritiesbetweenmusicartistsandbetweenmoviesfrommicroblogs

#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs

Ejemplares similares