Cargando…
Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116150/ https://www.ncbi.nlm.nih.gov/pubmed/34035801 http://dx.doi.org/10.1155/2021/6610997 |
_version_ | 1783691331786244096 |
---|---|
author | Huang, Kengda Zhou, Wujie Fang, Meixin |
author_facet | Huang, Kengda Zhou, Wujie Fang, Meixin |
author_sort | Huang, Kengda |
collection | PubMed |
description | In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin. |
format | Online Article Text |
id | pubmed-8116150 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-81161502021-05-24 Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images Huang, Kengda Zhou, Wujie Fang, Meixin Comput Intell Neurosci Research Article In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin. Hindawi 2021-05-05 /pmc/articles/PMC8116150/ /pubmed/34035801 http://dx.doi.org/10.1155/2021/6610997 Text en Copyright © 2021 Kengda Huang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Huang, Kengda Zhou, Wujie Fang, Meixin Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title | Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title_full | Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title_fullStr | Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title_full_unstemmed | Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title_short | Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images |
title_sort | deep multimodal fusion autoencoder for saliency prediction of rgb-d images |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116150/ https://www.ncbi.nlm.nih.gov/pubmed/34035801 http://dx.doi.org/10.1155/2021/6610997 |
work_keys_str_mv | AT huangkengda deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages AT zhouwujie deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages AT fangmeixin deepmultimodalfusionautoencoderforsaliencypredictionofrgbdimages |