Cargando…
Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambigu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148026/ http://dx.doi.org/10.1007/978-3-030-45442-5_4 |
_version_ | 1783520514499674112 |
---|---|
author | Kamphuis, Chris de Vries, Arjen P. Boytsov, Leonid Lin, Jimmy |
author_facet | Kamphuis, Chris de Vries, Arjen P. Boytsov, Leonid Lin, Jimmy |
author_sort | Kamphuis, Chris |
collection | PubMed |
description | When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work. |
format | Online Article Text |
id | pubmed-7148026 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71480262020-04-13 Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants Kamphuis, Chris de Vries, Arjen P. Boytsov, Leonid Lin, Jimmy Advances in Information Retrieval Article When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work. 2020-03-24 /pmc/articles/PMC7148026/ http://dx.doi.org/10.1007/978-3-030-45442-5_4 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Kamphuis, Chris de Vries, Arjen P. Boytsov, Leonid Lin, Jimmy Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title | Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title_full | Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title_fullStr | Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title_full_unstemmed | Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title_short | Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants |
title_sort | which bm25 do you mean? a large-scale reproducibility study of scoring variants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148026/ http://dx.doi.org/10.1007/978-3-030-45442-5_4 |
work_keys_str_mv | AT kamphuischris whichbm25doyoumeanalargescalereproducibilitystudyofscoringvariants AT devriesarjenp whichbm25doyoumeanalargescalereproducibilitystudyofscoringvariants AT boytsovleonid whichbm25doyoumeanalargescalereproducibilitystudyofscoringvariants AT linjimmy whichbm25doyoumeanalargescalereproducibilitystudyofscoringvariants |