Cargando…

A Practical Approach to Language Complexity: A Wikipedia Case Study

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified lan...

Descripción completa

Detalles Bibliográficos
Autores principales: Yasseri, Taha, Kornai, András, Kertész, János
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3492358/
https://www.ncbi.nlm.nih.gov/pubmed/23189130
http://dx.doi.org/10.1371/journal.pone.0048386
_version_ 1782249117580263424
author Yasseri, Taha
Kornai, András
Kertész, János
author_facet Yasseri, Taha
Kornai, András
Kertész, János
author_sort Yasseri, Taha
collection PubMed
description In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.
format Online
Article
Text
id pubmed-3492358
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34923582012-11-27 A Practical Approach to Language Complexity: A Wikipedia Case Study Yasseri, Taha Kornai, András Kertész, János PLoS One Research Article In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity. Public Library of Science 2012-11-07 /pmc/articles/PMC3492358/ /pubmed/23189130 http://dx.doi.org/10.1371/journal.pone.0048386 Text en © 2012 Yasseri et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yasseri, Taha
Kornai, András
Kertész, János
A Practical Approach to Language Complexity: A Wikipedia Case Study
title A Practical Approach to Language Complexity: A Wikipedia Case Study
title_full A Practical Approach to Language Complexity: A Wikipedia Case Study
title_fullStr A Practical Approach to Language Complexity: A Wikipedia Case Study
title_full_unstemmed A Practical Approach to Language Complexity: A Wikipedia Case Study
title_short A Practical Approach to Language Complexity: A Wikipedia Case Study
title_sort practical approach to language complexity: a wikipedia case study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3492358/
https://www.ncbi.nlm.nih.gov/pubmed/23189130
http://dx.doi.org/10.1371/journal.pone.0048386
work_keys_str_mv AT yasseritaha apracticalapproachtolanguagecomplexityawikipediacasestudy
AT kornaiandras apracticalapproachtolanguagecomplexityawikipediacasestudy
AT kerteszjanos apracticalapproachtolanguagecomplexityawikipediacasestudy
AT yasseritaha practicalapproachtolanguagecomplexityawikipediacasestudy
AT kornaiandras practicalapproachtolanguagecomplexityawikipediacasestudy
AT kerteszjanos practicalapproachtolanguagecomplexityawikipediacasestudy