Cargando…

Isolated single sound lip-reading using a frame-based camera and event-based camera

Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recog...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kanamaru, Tatsuya, Arakane, Taiki, Saitoh, Takeshi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/ https://www.ncbi.nlm.nih.gov/pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964

_version_	1784877850011107328
author	Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi
author_facet	Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi
author_sort	Kanamaru, Tatsuya
collection	PubMed
description	Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.
format	Online Article Text
id	pubmed-9874941
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98749412023-01-26 Isolated single sound lip-reading using a frame-based camera and event-based camera Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi Front Artif Intell Artificial Intelligence Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality. Frontiers Media S.A. 2023-01-11 /pmc/articles/PMC9874941/ /pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964 Text en Copyright © 2023 Kanamaru, Arakane and Saitoh. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi Isolated single sound lip-reading using a frame-based camera and event-based camera
title	Isolated single sound lip-reading using a frame-based camera and event-based camera
title_full	Isolated single sound lip-reading using a frame-based camera and event-based camera
title_fullStr	Isolated single sound lip-reading using a frame-based camera and event-based camera
title_full_unstemmed	Isolated single sound lip-reading using a frame-based camera and event-based camera
title_short	Isolated single sound lip-reading using a frame-based camera and event-based camera
title_sort	isolated single sound lip-reading using a frame-based camera and event-based camera
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/ https://www.ncbi.nlm.nih.gov/pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964
work_keys_str_mv	AT kanamarutatsuya isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera AT arakanetaiki isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera AT saitohtakeshi isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera

Isolated single sound lip-reading using a frame-based camera and event-based camera

Ejemplares similares