Cargando…

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdrakhmanova, Madina, Kuzdeuov, Askat, Jarju, Sheikh, Khassanov, Yerbolat, Lewis, Michael, Varol, Huseyin Atakan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8156799/
https://www.ncbi.nlm.nih.gov/pubmed/34065700
http://dx.doi.org/10.3390/s21103465
_version_ 1783699532704382976
author Abdrakhmanova, Madina
Kuzdeuov, Askat
Jarju, Sheikh
Khassanov, Yerbolat
Lewis, Michael
Varol, Huseyin Atakan
author_facet Abdrakhmanova, Madina
Kuzdeuov, Askat
Jarju, Sheikh
Khassanov, Yerbolat
Lewis, Michael
Varol, Huseyin Atakan
author_sort Abdrakhmanova, Madina
collection PubMed
description We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer.
format Online
Article
Text
id pubmed-8156799
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81567992021-05-28 SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams Abdrakhmanova, Madina Kuzdeuov, Askat Jarju, Sheikh Khassanov, Yerbolat Lewis, Michael Varol, Huseyin Atakan Sensors (Basel) Article We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer. MDPI 2021-05-16 /pmc/articles/PMC8156799/ /pubmed/34065700 http://dx.doi.org/10.3390/s21103465 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Abdrakhmanova, Madina
Kuzdeuov, Askat
Jarju, Sheikh
Khassanov, Yerbolat
Lewis, Michael
Varol, Huseyin Atakan
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title_full SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title_fullStr SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title_full_unstemmed SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title_short SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
title_sort speakingfaces: a large-scale multimodal dataset of voice commands with visual and thermal video streams
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8156799/
https://www.ncbi.nlm.nih.gov/pubmed/34065700
http://dx.doi.org/10.3390/s21103465
work_keys_str_mv AT abdrakhmanovamadina speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams
AT kuzdeuovaskat speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams
AT jarjusheikh speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams
AT khassanovyerbolat speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams
AT lewismichael speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams
AT varolhuseyinatakan speakingfacesalargescalemultimodaldatasetofvoicecommandswithvisualandthermalvideostreams