Cargando…

Isolated single sound lip-reading using a frame-based camera and event-based camera

Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recog...

Descripción completa

Detalles Bibliográficos
Autores principales: Kanamaru, Tatsuya, Arakane, Taiki, Saitoh, Takeshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/
https://www.ncbi.nlm.nih.gov/pubmed/36714203
http://dx.doi.org/10.3389/frai.2022.1070964
_version_ 1784877850011107328
author Kanamaru, Tatsuya
Arakane, Taiki
Saitoh, Takeshi
author_facet Kanamaru, Tatsuya
Arakane, Taiki
Saitoh, Takeshi
author_sort Kanamaru, Tatsuya
collection PubMed
description Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.
format Online
Article
Text
id pubmed-9874941
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-98749412023-01-26 Isolated single sound lip-reading using a frame-based camera and event-based camera Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi Front Artif Intell Artificial Intelligence Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality. Frontiers Media S.A. 2023-01-11 /pmc/articles/PMC9874941/ /pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964 Text en Copyright © 2023 Kanamaru, Arakane and Saitoh. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Kanamaru, Tatsuya
Arakane, Taiki
Saitoh, Takeshi
Isolated single sound lip-reading using a frame-based camera and event-based camera
title Isolated single sound lip-reading using a frame-based camera and event-based camera
title_full Isolated single sound lip-reading using a frame-based camera and event-based camera
title_fullStr Isolated single sound lip-reading using a frame-based camera and event-based camera
title_full_unstemmed Isolated single sound lip-reading using a frame-based camera and event-based camera
title_short Isolated single sound lip-reading using a frame-based camera and event-based camera
title_sort isolated single sound lip-reading using a frame-based camera and event-based camera
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/
https://www.ncbi.nlm.nih.gov/pubmed/36714203
http://dx.doi.org/10.3389/frai.2022.1070964
work_keys_str_mv AT kanamarutatsuya isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera
AT arakanetaiki isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera
AT saitohtakeshi isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera