Cargando…

Deep Monocular Depth Estimation Based on Content and Contextual Features

Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation,...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdulwahab, Saddam, Rashwan, Hatem A., Sharaf, Najwa, Khalid, Saif, Puig, Domenec
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055838/
https://www.ncbi.nlm.nih.gov/pubmed/36991629
http://dx.doi.org/10.3390/s23062919
Descripción
Sumario:Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of [Formula: see text] , while minimizing the error [Formula: see text] by [Formula: see text] , [Formula: see text] by [Formula: see text] , and [Formula: see text] by [Formula: see text]. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.