Cargando…

Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits

This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both head and tail fits of zipfian word distributions are more independent of corpus size than other lexical div...

Descripción completa

Detalles Bibliográficos
Autores principales: Bérubé, Nicolas, Sainte-Marie, Maxime, Mongeon, Philippe, Larivière, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037356/
https://www.ncbi.nlm.nih.gov/pubmed/29985920
http://dx.doi.org/10.1371/journal.pone.0197775
_version_ 1783338314979344384
author Bérubé, Nicolas
Sainte-Marie, Maxime
Mongeon, Philippe
Larivière, Vincent
author_facet Bérubé, Nicolas
Sainte-Marie, Maxime
Mongeon, Philippe
Larivière, Vincent
author_sort Bérubé, Nicolas
collection PubMed
description This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both head and tail fits of zipfian word distributions are more independent of corpus size than other lexical diversity indicators, the latter however neatly outperforms the former in that regard. This benchmark-setting performance of zipfian distribution tails proves extremely handy in distinguishing actual patterns in lexical diversity from the statistical noise generated by other indicators due to corpus size fluctuations. From an empirical perspective, analysis of Web of Science (WoS) article titles from 1975 to 2014 shows that the lexical concentration of scholarly titles in Natural Sciences & Engineering (NSE) and Social Sciences & Humanities (SSH) articles increases by a little less than 8% over the whole period. With the exception of the lexically concentrated Mathematics, Earth & Space, and Physics, NSE article titles all increased in lexical concentration, suggesting a probable convergence of concentration levels in the near future. As regards to SSH disciplines, aggregation effects observed at the disciplinary group level suggests that, behind the stable concentration levels of SSH disciplines, a cross-disciplinary homogenization of the highest word frequency ranks may be at work. Overall, these trends suggest a progressive standardization of title wording in scientific article titles, as article titles get written using an increasingly restricted and cross-disciplinary set of words.
format Online
Article
Text
id pubmed-6037356
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60373562018-07-19 Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits Bérubé, Nicolas Sainte-Marie, Maxime Mongeon, Philippe Larivière, Vincent PLoS One Research Article This research assesses the evolution of lexical diversity in scholarly titles using a new indicator based on zipfian frequency-rank distribution tail fits. At the operational level, while both head and tail fits of zipfian word distributions are more independent of corpus size than other lexical diversity indicators, the latter however neatly outperforms the former in that regard. This benchmark-setting performance of zipfian distribution tails proves extremely handy in distinguishing actual patterns in lexical diversity from the statistical noise generated by other indicators due to corpus size fluctuations. From an empirical perspective, analysis of Web of Science (WoS) article titles from 1975 to 2014 shows that the lexical concentration of scholarly titles in Natural Sciences & Engineering (NSE) and Social Sciences & Humanities (SSH) articles increases by a little less than 8% over the whole period. With the exception of the lexically concentrated Mathematics, Earth & Space, and Physics, NSE article titles all increased in lexical concentration, suggesting a probable convergence of concentration levels in the near future. As regards to SSH disciplines, aggregation effects observed at the disciplinary group level suggests that, behind the stable concentration levels of SSH disciplines, a cross-disciplinary homogenization of the highest word frequency ranks may be at work. Overall, these trends suggest a progressive standardization of title wording in scientific article titles, as article titles get written using an increasingly restricted and cross-disciplinary set of words. Public Library of Science 2018-07-09 /pmc/articles/PMC6037356/ /pubmed/29985920 http://dx.doi.org/10.1371/journal.pone.0197775 Text en © 2018 Bérubé et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bérubé, Nicolas
Sainte-Marie, Maxime
Mongeon, Philippe
Larivière, Vincent
Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title_full Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title_fullStr Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title_full_unstemmed Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title_short Words by the tail: Assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
title_sort words by the tail: assessing lexical diversity in scholarly titles using frequency-rank distribution tail fits
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037356/
https://www.ncbi.nlm.nih.gov/pubmed/29985920
http://dx.doi.org/10.1371/journal.pone.0197775
work_keys_str_mv AT berubenicolas wordsbythetailassessinglexicaldiversityinscholarlytitlesusingfrequencyrankdistributiontailfits
AT saintemariemaxime wordsbythetailassessinglexicaldiversityinscholarlytitlesusingfrequencyrankdistributiontailfits
AT mongeonphilippe wordsbythetailassessinglexicaldiversityinscholarlytitlesusingfrequencyrankdistributiontailfits
AT larivierevincent wordsbythetailassessinglexicaldiversityinscholarlytitlesusingfrequencyrankdistributiontailfits