Cargando…

Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal an...

Descripción completa

Detalles Bibliográficos
Autores principales: Algabri, Mohammed, Mathkour, Hassan, Alsulaiman, Mansour M., Bencherif, Mohamed A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914998/
https://www.ncbi.nlm.nih.gov/pubmed/33572169
http://dx.doi.org/10.3390/s21041205
_version_ 1783657134859223040
author Algabri, Mohammed
Mathkour, Hassan
Alsulaiman, Mansour M.
Bencherif, Mohamed A.
author_facet Algabri, Mohammed
Mathkour, Hassan
Alsulaiman, Mansour M.
Bencherif, Mohamed A.
author_sort Algabri, Mohammed
collection PubMed
description This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition.
format Online
Article
Text
id pubmed-7914998
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79149982021-03-01 Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech Algabri, Mohammed Mathkour, Hassan Alsulaiman, Mansour M. Bencherif, Mohamed A. Sensors (Basel) Article This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition. MDPI 2021-02-09 /pmc/articles/PMC7914998/ /pubmed/33572169 http://dx.doi.org/10.3390/s21041205 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Algabri, Mohammed
Mathkour, Hassan
Alsulaiman, Mansour M.
Bencherif, Mohamed A.
Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title_full Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title_fullStr Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title_full_unstemmed Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title_short Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
title_sort deep learning-based detection of articulatory features in arabic and english speech
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914998/
https://www.ncbi.nlm.nih.gov/pubmed/33572169
http://dx.doi.org/10.3390/s21041205
work_keys_str_mv AT algabrimohammed deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech
AT mathkourhassan deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech
AT alsulaimanmansourm deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech
AT bencherifmohameda deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech