Cargando…

Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit

Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without ph...

Descripción completa

Detalles Bibliográficos
Autores principales: Arnold, Denis, Tomaschek, Fabian, Sering, Konstantin, Lopez, Florence, Baayen, R. Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5386243/
https://www.ncbi.nlm.nih.gov/pubmed/28394938
http://dx.doi.org/10.1371/journal.pone.0174623
_version_ 1782520732347006976
author Arnold, Denis
Tomaschek, Fabian
Sering, Konstantin
Lopez, Florence
Baayen, R. Harald
author_facet Arnold, Denis
Tomaschek, Fabian
Sering, Konstantin
Lopez, Florence
Baayen, R. Harald
author_sort Arnold, Denis
collection PubMed
description Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20–44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a ‘wide’ yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
format Online
Article
Text
id pubmed-5386243
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-53862432017-05-03 Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit Arnold, Denis Tomaschek, Fabian Sering, Konstantin Lopez, Florence Baayen, R. Harald PLoS One Research Article Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20–44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a ‘wide’ yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory. Public Library of Science 2017-04-10 /pmc/articles/PMC5386243/ /pubmed/28394938 http://dx.doi.org/10.1371/journal.pone.0174623 Text en © 2017 Arnold et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Arnold, Denis
Tomaschek, Fabian
Sering, Konstantin
Lopez, Florence
Baayen, R. Harald
Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title_full Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title_fullStr Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title_full_unstemmed Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title_short Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
title_sort words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5386243/
https://www.ncbi.nlm.nih.gov/pubmed/28394938
http://dx.doi.org/10.1371/journal.pone.0174623
work_keys_str_mv AT arnolddenis wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit
AT tomaschekfabian wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit
AT seringkonstantin wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit
AT lopezflorence wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit
AT baayenrharald wordsfromspontaneousconversationalspeechcanberecognizedwithhumanlikeaccuracybyanerrordrivenlearningalgorithmthatdiscriminatesbetweenmeaningsstraightfromsmartacousticfeaturesbypassingthephonemeasrecognitionunit