Cargando…

On the Accuracy of Language Trees

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different...

Descripción completa

Detalles Bibliográficos
Autores principales: Pompei, Simone, Loreto, Vittorio, Tria, Francesca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3108590/
https://www.ncbi.nlm.nih.gov/pubmed/21674034
http://dx.doi.org/10.1371/journal.pone.0020109
_version_ 1782205332955594752
author Pompei, Simone
Loreto, Vittorio
Tria, Francesca
author_facet Pompei, Simone
Loreto, Vittorio
Tria, Francesca
author_sort Pompei, Simone
collection PubMed
description Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.
format Online
Article
Text
id pubmed-3108590
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31085902011-06-13 On the Accuracy of Language Trees Pompei, Simone Loreto, Vittorio Tria, Francesca PLoS One Research Article Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. Public Library of Science 2011-06-03 /pmc/articles/PMC3108590/ /pubmed/21674034 http://dx.doi.org/10.1371/journal.pone.0020109 Text en Pompei et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Pompei, Simone
Loreto, Vittorio
Tria, Francesca
On the Accuracy of Language Trees
title On the Accuracy of Language Trees
title_full On the Accuracy of Language Trees
title_fullStr On the Accuracy of Language Trees
title_full_unstemmed On the Accuracy of Language Trees
title_short On the Accuracy of Language Trees
title_sort on the accuracy of language trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3108590/
https://www.ncbi.nlm.nih.gov/pubmed/21674034
http://dx.doi.org/10.1371/journal.pone.0020109
work_keys_str_mv AT pompeisimone ontheaccuracyoflanguagetrees
AT loretovittorio ontheaccuracyoflanguagetrees
AT triafrancesca ontheaccuracyoflanguagetrees