Cargando…

On the Development of Speech Resources for the Mixtec Language

The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries wh...

Descripción completa

Detalles Bibliográficos
Autor principal: Caballero-Morales, Santiago-Omar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654258/
https://www.ncbi.nlm.nih.gov/pubmed/23710134
http://dx.doi.org/10.1155/2013/170649
_version_ 1782269517941964800
author Caballero-Morales, Santiago-Omar
author_facet Caballero-Morales, Santiago-Omar
author_sort Caballero-Morales, Santiago-Omar
collection PubMed
description The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users).
format Online
Article
Text
id pubmed-3654258
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-36542582013-05-24 On the Development of Speech Resources for the Mixtec Language Caballero-Morales, Santiago-Omar ScientificWorldJournal Research Article The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). Hindawi Publishing Corporation 2013-04-16 /pmc/articles/PMC3654258/ /pubmed/23710134 http://dx.doi.org/10.1155/2013/170649 Text en Copyright © 2013 Santiago-Omar Caballero-Morales. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Caballero-Morales, Santiago-Omar
On the Development of Speech Resources for the Mixtec Language
title On the Development of Speech Resources for the Mixtec Language
title_full On the Development of Speech Resources for the Mixtec Language
title_fullStr On the Development of Speech Resources for the Mixtec Language
title_full_unstemmed On the Development of Speech Resources for the Mixtec Language
title_short On the Development of Speech Resources for the Mixtec Language
title_sort on the development of speech resources for the mixtec language
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654258/
https://www.ncbi.nlm.nih.gov/pubmed/23710134
http://dx.doi.org/10.1155/2013/170649
work_keys_str_mv AT caballeromoralessantiagoomar onthedevelopmentofspeechresourcesforthemixteclanguage