Cargando…
Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech
This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal an...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914998/ https://www.ncbi.nlm.nih.gov/pubmed/33572169 http://dx.doi.org/10.3390/s21041205 |
_version_ | 1783657134859223040 |
---|---|
author | Algabri, Mohammed Mathkour, Hassan Alsulaiman, Mansour M. Bencherif, Mohamed A. |
author_facet | Algabri, Mohammed Mathkour, Hassan Alsulaiman, Mansour M. Bencherif, Mohamed A. |
author_sort | Algabri, Mohammed |
collection | PubMed |
description | This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition. |
format | Online Article Text |
id | pubmed-7914998 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-79149982021-03-01 Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech Algabri, Mohammed Mathkour, Hassan Alsulaiman, Mansour M. Bencherif, Mohamed A. Sensors (Basel) Article This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition. MDPI 2021-02-09 /pmc/articles/PMC7914998/ /pubmed/33572169 http://dx.doi.org/10.3390/s21041205 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Algabri, Mohammed Mathkour, Hassan Alsulaiman, Mansour M. Bencherif, Mohamed A. Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title | Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title_full | Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title_fullStr | Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title_full_unstemmed | Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title_short | Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech |
title_sort | deep learning-based detection of articulatory features in arabic and english speech |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7914998/ https://www.ncbi.nlm.nih.gov/pubmed/33572169 http://dx.doi.org/10.3390/s21041205 |
work_keys_str_mv | AT algabrimohammed deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech AT mathkourhassan deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech AT alsulaimanmansourm deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech AT bencherifmohameda deeplearningbaseddetectionofarticulatoryfeaturesinarabicandenglishspeech |