Cargando…

Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images

In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D i...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Kengda, Zhou, Wujie, Fang, Meixin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116150/
https://www.ncbi.nlm.nih.gov/pubmed/34035801
http://dx.doi.org/10.1155/2021/6610997
_version_ 1783691331786244096
author Huang, Kengda
Zhou, Wujie
Fang, Meixin
author_facet Huang, Kengda
Zhou, Wujie
Fang, Meixin
author_sort Huang, Kengda
collection PubMed
description In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin.
format Online
Article
Text
id pubmed-8116150
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-81161502021-05-24 Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images Huang, Kengda Zhou, Wujie Fang, Meixin Comput Intell Neurosci Research Article In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin. Hindawi 2021-05-05 /pmc/articles/PMC8116150/ /pubmed/34035801 http://dx.doi.org/10.1155/2021/6610997 Text en Copyright © 2021 Kengda Huang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Kengda
Zhou, Wujie
Fang, Meixin
Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title_full Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title_fullStr Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title_full_unstemmed Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title_short Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
title_sort deep multimodal fusion autoencoder for saliency prediction of rgb-d images
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116150/
https://www.ncbi.nlm.nih.gov/pubmed/34035801
http://dx.doi.org/10.1155/2021/6610997
work_keys_str_mv AT huangkengda deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages
AT zhouwujie deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages
AT fangmeixin deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages