Cargando…

Robust clustering of languages across Wikipedia growth

Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 sma...

Descripción completa

Detalles Bibliográficos
Autores principales: Ban, Kristina, Perc, Matjaž, Levnajić, Zoran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666289/
https://www.ncbi.nlm.nih.gov/pubmed/29134106
http://dx.doi.org/10.1098/rsos.171217
_version_ 1783275279225978880
author Ban, Kristina
Perc, Matjaž
Levnajić, Zoran
author_facet Ban, Kristina
Perc, Matjaž
Levnajić, Zoran
author_sort Ban, Kristina
collection PubMed
description Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy.
format Online
Article
Text
id pubmed-5666289
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher The Royal Society Publishing
record_format MEDLINE/PubMed
spelling pubmed-56662892017-11-13 Robust clustering of languages across Wikipedia growth Ban, Kristina Perc, Matjaž Levnajić, Zoran R Soc Open Sci Engineering Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy. The Royal Society Publishing 2017-10-18 /pmc/articles/PMC5666289/ /pubmed/29134106 http://dx.doi.org/10.1098/rsos.171217 Text en © 2017 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Engineering
Ban, Kristina
Perc, Matjaž
Levnajić, Zoran
Robust clustering of languages across Wikipedia growth
title Robust clustering of languages across Wikipedia growth
title_full Robust clustering of languages across Wikipedia growth
title_fullStr Robust clustering of languages across Wikipedia growth
title_full_unstemmed Robust clustering of languages across Wikipedia growth
title_short Robust clustering of languages across Wikipedia growth
title_sort robust clustering of languages across wikipedia growth
topic Engineering
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666289/
https://www.ncbi.nlm.nih.gov/pubmed/29134106
http://dx.doi.org/10.1098/rsos.171217
work_keys_str_mv AT bankristina robustclusteringoflanguagesacrosswikipediagrowth
AT percmatjaz robustclusteringoflanguagesacrosswikipediagrowth
AT levnajiczoran robustclusteringoflanguagesacrosswikipediagrowth