Cargando…
Productivity and Predictability for Measuring Morphological Complexity
We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516478/ https://www.ncbi.nlm.nih.gov/pubmed/33285823 http://dx.doi.org/10.3390/e22010048 |
_version_ | 1783587010767749120 |
---|---|
author | Gutierrez-Vasques, Ximena Mijangos, Victor |
author_facet | Gutierrez-Vasques, Ximena Mijangos, Victor |
author_sort | Gutierrez-Vasques, Ximena |
collection | PubMed |
description | We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the predictability of those morphological processes. We use a language model that predicts the probability of sub-word sequences within a word; we calculate the entropy rate of this model and use it as a measure of predictability of the internal structure of words. Our results show that it is important to integrate these two dimensions when measuring morphological complexity, since languages can be complex under one measure but simpler under another one. We calculated the complexity measures in two different parallel corpora for a typologically diverse set of languages. Our approach is corpus-based and it does not require the use of linguistic annotated data. |
format | Online Article Text |
id | pubmed-7516478 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75164782020-11-09 Productivity and Predictability for Measuring Morphological Complexity Gutierrez-Vasques, Ximena Mijangos, Victor Entropy (Basel) Article We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the predictability of those morphological processes. We use a language model that predicts the probability of sub-word sequences within a word; we calculate the entropy rate of this model and use it as a measure of predictability of the internal structure of words. Our results show that it is important to integrate these two dimensions when measuring morphological complexity, since languages can be complex under one measure but simpler under another one. We calculated the complexity measures in two different parallel corpora for a typologically diverse set of languages. Our approach is corpus-based and it does not require the use of linguistic annotated data. MDPI 2019-12-30 /pmc/articles/PMC7516478/ /pubmed/33285823 http://dx.doi.org/10.3390/e22010048 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gutierrez-Vasques, Ximena Mijangos, Victor Productivity and Predictability for Measuring Morphological Complexity |
title | Productivity and Predictability for Measuring Morphological Complexity |
title_full | Productivity and Predictability for Measuring Morphological Complexity |
title_fullStr | Productivity and Predictability for Measuring Morphological Complexity |
title_full_unstemmed | Productivity and Predictability for Measuring Morphological Complexity |
title_short | Productivity and Predictability for Measuring Morphological Complexity |
title_sort | productivity and predictability for measuring morphological complexity |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516478/ https://www.ncbi.nlm.nih.gov/pubmed/33285823 http://dx.doi.org/10.3390/e22010048 |
work_keys_str_mv | AT gutierrezvasquesximena productivityandpredictabilityformeasuringmorphologicalcomplexity AT mijangosvictor productivityandpredictabilityformeasuringmorphologicalcomplexity |