Cargando…
High quality monocular depth estimation with parallel decoder
Monocular depth estimation aims to recover the depth information in three-dimensional (3D) space from a single image efficiently, but it is an ill-posed problem. Recently, Transformer-based architectures have achieved excellent accuracy in monocular depth estimation. However, due to the characterist...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534839/ https://www.ncbi.nlm.nih.gov/pubmed/36198741 http://dx.doi.org/10.1038/s41598-022-20909-x |
_version_ | 1784802636592054272 |
---|---|
author | Liu, Jiatao Zhang, Yaping |
author_facet | Liu, Jiatao Zhang, Yaping |
author_sort | Liu, Jiatao |
collection | PubMed |
description | Monocular depth estimation aims to recover the depth information in three-dimensional (3D) space from a single image efficiently, but it is an ill-posed problem. Recently, Transformer-based architectures have achieved excellent accuracy in monocular depth estimation. However, due to the characteristics of Transformer, the model parameters are huge and the inference speed is slow. In traditional convolutional neural network–based architectures, many encoder-decoders perform serial fusion of the multi-scale features of each stage of the encoder and then output predictions. However, in these approaches it may be difficult to recover the spatial information lost by the encoder during pooling and convolution. To enhance this serial structure, we propose a structure from the decoder perspective, which first predicts global and local depth information in parallel and then fuses them. Results show that this structure is an effective improvement over traditional methods and has accuracy comparable with that of state-of-the-art methods in both indoor and outdoor scenes, but with fewer parameters and computations. Moreover, results of ablation studies verify the effectiveness of the proposed decoder. |
format | Online Article Text |
id | pubmed-9534839 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-95348392022-10-07 High quality monocular depth estimation with parallel decoder Liu, Jiatao Zhang, Yaping Sci Rep Article Monocular depth estimation aims to recover the depth information in three-dimensional (3D) space from a single image efficiently, but it is an ill-posed problem. Recently, Transformer-based architectures have achieved excellent accuracy in monocular depth estimation. However, due to the characteristics of Transformer, the model parameters are huge and the inference speed is slow. In traditional convolutional neural network–based architectures, many encoder-decoders perform serial fusion of the multi-scale features of each stage of the encoder and then output predictions. However, in these approaches it may be difficult to recover the spatial information lost by the encoder during pooling and convolution. To enhance this serial structure, we propose a structure from the decoder perspective, which first predicts global and local depth information in parallel and then fuses them. Results show that this structure is an effective improvement over traditional methods and has accuracy comparable with that of state-of-the-art methods in both indoor and outdoor scenes, but with fewer parameters and computations. Moreover, results of ablation studies verify the effectiveness of the proposed decoder. Nature Publishing Group UK 2022-10-05 /pmc/articles/PMC9534839/ /pubmed/36198741 http://dx.doi.org/10.1038/s41598-022-20909-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Liu, Jiatao Zhang, Yaping High quality monocular depth estimation with parallel decoder |
title | High quality monocular depth estimation with parallel decoder |
title_full | High quality monocular depth estimation with parallel decoder |
title_fullStr | High quality monocular depth estimation with parallel decoder |
title_full_unstemmed | High quality monocular depth estimation with parallel decoder |
title_short | High quality monocular depth estimation with parallel decoder |
title_sort | high quality monocular depth estimation with parallel decoder |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534839/ https://www.ncbi.nlm.nih.gov/pubmed/36198741 http://dx.doi.org/10.1038/s41598-022-20909-x |
work_keys_str_mv | AT liujiatao highqualitymonoculardepthestimationwithparalleldecoder AT zhangyaping highqualitymonoculardepthestimationwithparalleldecoder |