Cargando…

On the fractal patterns of language structures

Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ribeiro, Leonardo Costa, Bernardes, Américo Tristão, Mello, Heliana
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194960/ https://www.ncbi.nlm.nih.gov/pubmed/37200318 http://dx.doi.org/10.1371/journal.pone.0285630

_version_	1785044128319406080
author	Ribeiro, Leonardo Costa Bernardes, Américo Tristão Mello, Heliana
author_facet	Ribeiro, Leonardo Costa Bernardes, Américo Tristão Mello, Heliana
author_sort	Ribeiro, Leonardo Costa
collection	PubMed
description	Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
format	Online Article Text
id	pubmed-10194960
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-101949602023-05-19 On the fractal patterns of language structures Ribeiro, Leonardo Costa Bernardes, Américo Tristão Mello, Heliana PLoS One Research Article Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor. Public Library of Science 2023-05-18 /pmc/articles/PMC10194960/ /pubmed/37200318 http://dx.doi.org/10.1371/journal.pone.0285630 Text en © 2023 Ribeiro et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Ribeiro, Leonardo Costa Bernardes, Américo Tristão Mello, Heliana On the fractal patterns of language structures
title	On the fractal patterns of language structures
title_full	On the fractal patterns of language structures
title_fullStr	On the fractal patterns of language structures
title_full_unstemmed	On the fractal patterns of language structures
title_short	On the fractal patterns of language structures
title_sort	on the fractal patterns of language structures
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194960/ https://www.ncbi.nlm.nih.gov/pubmed/37200318 http://dx.doi.org/10.1371/journal.pone.0285630
work_keys_str_mv	AT ribeiroleonardocosta onthefractalpatternsoflanguagestructures AT bernardesamericotristao onthefractalpatternsoflanguagestructures AT melloheliana onthefractalpatternsoflanguagestructures

On the fractal patterns of language structures

Ejemplares similares