Cargando…
Robust clustering of languages across Wikipedia growth
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 sma...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society Publishing
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666289/ https://www.ncbi.nlm.nih.gov/pubmed/29134106 http://dx.doi.org/10.1098/rsos.171217 |
_version_ | 1783275279225978880 |
---|---|
author | Ban, Kristina Perc, Matjaž Levnajić, Zoran |
author_facet | Ban, Kristina Perc, Matjaž Levnajić, Zoran |
author_sort | Ban, Kristina |
collection | PubMed |
description | Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy. |
format | Online Article Text |
id | pubmed-5666289 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | The Royal Society Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-56662892017-11-13 Robust clustering of languages across Wikipedia growth Ban, Kristina Perc, Matjaž Levnajić, Zoran R Soc Open Sci Engineering Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy. The Royal Society Publishing 2017-10-18 /pmc/articles/PMC5666289/ /pubmed/29134106 http://dx.doi.org/10.1098/rsos.171217 Text en © 2017 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Engineering Ban, Kristina Perc, Matjaž Levnajić, Zoran Robust clustering of languages across Wikipedia growth |
title | Robust clustering of languages across Wikipedia growth |
title_full | Robust clustering of languages across Wikipedia growth |
title_fullStr | Robust clustering of languages across Wikipedia growth |
title_full_unstemmed | Robust clustering of languages across Wikipedia growth |
title_short | Robust clustering of languages across Wikipedia growth |
title_sort | robust clustering of languages across wikipedia growth |
topic | Engineering |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5666289/ https://www.ncbi.nlm.nih.gov/pubmed/29134106 http://dx.doi.org/10.1098/rsos.171217 |
work_keys_str_mv | AT bankristina robustclusteringoflanguagesacrosswikipediagrowth AT percmatjaz robustclusteringoflanguagesacrosswikipediagrowth AT levnajiczoran robustclusteringoflanguagesacrosswikipediagrowth |