Cargando…
Isolated single sound lip-reading using a frame-based camera and event-based camera
Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recog...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/ https://www.ncbi.nlm.nih.gov/pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964 |
_version_ | 1784877850011107328 |
---|---|
author | Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi |
author_facet | Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi |
author_sort | Kanamaru, Tatsuya |
collection | PubMed |
description | Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality. |
format | Online Article Text |
id | pubmed-9874941 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-98749412023-01-26 Isolated single sound lip-reading using a frame-based camera and event-based camera Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi Front Artif Intell Artificial Intelligence Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality. Frontiers Media S.A. 2023-01-11 /pmc/articles/PMC9874941/ /pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964 Text en Copyright © 2023 Kanamaru, Arakane and Saitoh. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Artificial Intelligence Kanamaru, Tatsuya Arakane, Taiki Saitoh, Takeshi Isolated single sound lip-reading using a frame-based camera and event-based camera |
title | Isolated single sound lip-reading using a frame-based camera and event-based camera |
title_full | Isolated single sound lip-reading using a frame-based camera and event-based camera |
title_fullStr | Isolated single sound lip-reading using a frame-based camera and event-based camera |
title_full_unstemmed | Isolated single sound lip-reading using a frame-based camera and event-based camera |
title_short | Isolated single sound lip-reading using a frame-based camera and event-based camera |
title_sort | isolated single sound lip-reading using a frame-based camera and event-based camera |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874941/ https://www.ncbi.nlm.nih.gov/pubmed/36714203 http://dx.doi.org/10.3389/frai.2022.1070964 |
work_keys_str_mv | AT kanamarutatsuya isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera AT arakanetaiki isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera AT saitohtakeshi isolatedsinglesoundlipreadingusingaframebasedcameraandeventbasedcamera |