Cargando…

A selective approach to stemming for minimizing the risk of failure in information retrieval systems

Stemming is supposed to improve the average performance of an information retrieval system, but in practice, past experimental results show that this is not always the case. In this article, we propose a selective approach to stemming that decides whether stemming should be applied or not on a query...

Descripción completa

Detalles Bibliográficos
Autores principales:	Göksel, Gökhan, Arslan, Ahmet, Dinçer, Bekir Taner
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Data Mining and Machine Learning
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280253/ https://www.ncbi.nlm.nih.gov/pubmed/37346699 http://dx.doi.org/10.7717/peerj-cs.1175

_version_	1785060757913731072
author	Göksel, Gökhan Arslan, Ahmet Dinçer, Bekir Taner
author_facet	Göksel, Gökhan Arslan, Ahmet Dinçer, Bekir Taner
author_sort	Göksel, Gökhan
collection	PubMed
description	Stemming is supposed to improve the average performance of an information retrieval system, but in practice, past experimental results show that this is not always the case. In this article, we propose a selective approach to stemming that decides whether stemming should be applied or not on a query basis. Our method aims at minimizing the risk of failure caused by stemming in retrieving semantically-related documents. The proposed work mainly contributes to the IR literature by proposing an application of selective stemming and a set of new features that derived from the term frequency distributions of the systems in selection. The method based on the approach leverages both some of the query performance predictors and the derived features and a machine learning technique. It is comprehensively evaluated using three rule-based stemmers and eight query sets corresponding to four document collections from the standard TREC and NTCIR datasets. The document collections, except for one, include Web documents ranging from 25 million to 733 million. The results of the experiments show that the method is capable of making accurate selections that increase the robustness of the system and minimize the risk of failure (i.e., per query performance losses) across queries. The results also show that the method attains a systematically higher average retrieval performance than the single systems for most query sets.
format	Online Article Text
id	pubmed-10280253
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-102802532023-06-21 A selective approach to stemming for minimizing the risk of failure in information retrieval systems Göksel, Gökhan Arslan, Ahmet Dinçer, Bekir Taner PeerJ Comput Sci Data Mining and Machine Learning Stemming is supposed to improve the average performance of an information retrieval system, but in practice, past experimental results show that this is not always the case. In this article, we propose a selective approach to stemming that decides whether stemming should be applied or not on a query basis. Our method aims at minimizing the risk of failure caused by stemming in retrieving semantically-related documents. The proposed work mainly contributes to the IR literature by proposing an application of selective stemming and a set of new features that derived from the term frequency distributions of the systems in selection. The method based on the approach leverages both some of the query performance predictors and the derived features and a machine learning technique. It is comprehensively evaluated using three rule-based stemmers and eight query sets corresponding to four document collections from the standard TREC and NTCIR datasets. The document collections, except for one, include Web documents ranging from 25 million to 733 million. The results of the experiments show that the method is capable of making accurate selections that increase the robustness of the system and minimize the risk of failure (i.e., per query performance losses) across queries. The results also show that the method attains a systematically higher average retrieval performance than the single systems for most query sets. PeerJ Inc. 2023-01-10 /pmc/articles/PMC10280253/ /pubmed/37346699 http://dx.doi.org/10.7717/peerj-cs.1175 Text en ©2022 Göksel et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Data Mining and Machine Learning Göksel, Gökhan Arslan, Ahmet Dinçer, Bekir Taner A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title	A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title_full	A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title_fullStr	A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title_full_unstemmed	A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title_short	A selective approach to stemming for minimizing the risk of failure in information retrieval systems
title_sort	selective approach to stemming for minimizing the risk of failure in information retrieval systems
topic	Data Mining and Machine Learning
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280253/ https://www.ncbi.nlm.nih.gov/pubmed/37346699 http://dx.doi.org/10.7717/peerj-cs.1175
work_keys_str_mv	AT gokselgokhan aselectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems AT arslanahmet aselectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems AT dincerbekirtaner aselectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems AT gokselgokhan selectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems AT arslanahmet selectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems AT dincerbekirtaner selectiveapproachtostemmingforminimizingtheriskoffailureininformationretrievalsystems

A selective approach to stemming for minimizing the risk of failure in information retrieval systems

Ejemplares similares