Cargando…

The method behind the unprecedented production of indicators of the presence of languages in the Internet

Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Pimienta, Daniel, Blanco, Álvaro, de Oliveira, Gilvan Müller
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10233101/
https://www.ncbi.nlm.nih.gov/pubmed/37273659
http://dx.doi.org/10.3389/frma.2023.1149347
_version_ 1785052160823656448
author Pimienta, Daniel
Blanco, Álvaro
de Oliveira, Gilvan Müller
author_facet Pimienta, Daniel
Blanco, Álvaro
de Oliveira, Gilvan Müller
author_sort Pimienta, Daniel
collection PubMed
description Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism.
format Online
Article
Text
id pubmed-10233101
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-102331012023-06-02 The method behind the unprecedented production of indicators of the presence of languages in the Internet Pimienta, Daniel Blanco, Álvaro de Oliveira, Gilvan Müller Front Res Metr Anal Research Metrics and Analytics Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism. Frontiers Media S.A. 2023-05-18 /pmc/articles/PMC10233101/ /pubmed/37273659 http://dx.doi.org/10.3389/frma.2023.1149347 Text en Copyright © 2023 Pimienta, Blanco and de Oliveira. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Research Metrics and Analytics
Pimienta, Daniel
Blanco, Álvaro
de Oliveira, Gilvan Müller
The method behind the unprecedented production of indicators of the presence of languages in the Internet
title The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_full The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_fullStr The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_full_unstemmed The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_short The method behind the unprecedented production of indicators of the presence of languages in the Internet
title_sort method behind the unprecedented production of indicators of the presence of languages in the internet
topic Research Metrics and Analytics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10233101/
https://www.ncbi.nlm.nih.gov/pubmed/37273659
http://dx.doi.org/10.3389/frma.2023.1149347
work_keys_str_mv AT pimientadaniel themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT blancoalvaro themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT deoliveiragilvanmuller themethodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT pimientadaniel methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT blancoalvaro methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet
AT deoliveiragilvanmuller methodbehindtheunprecedentedproductionofindicatorsofthepresenceoflanguagesintheinternet