Cargando…

Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation

Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estima...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Jimin, Lee, Sang Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622578/
https://www.ncbi.nlm.nih.gov/pubmed/37919392
http://dx.doi.org/10.1038/s41598-023-46178-w
_version_ 1785130570651533312
author Song, Jimin
Lee, Sang Jun
author_facet Song, Jimin
Lee, Sang Jun
author_sort Song, Jimin
collection PubMed
description Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estimation relieves the burden for collecting calibrated training data, while there is still a large performance gap between supervised and self-supervised methods. The objective of this study is to reduce the performance gap between the supervised and self-supervised approaches. The loss function of previous self-supervised methods is mainly based on a photometric error, which is indirectly computed from synthesized images using depth and pose estimates. In this paper, we argue that direct depth cue is more effective to train a depth estimation network. To obtain the direct depth cue, we employed a knowledge distillation technique, which is a teacher-student learning framework. The teacher network was trained in a self-supervised manner based on a photometric error, and its predictions were utilized to train a student network. We constructed a multi-scale dense prediction transformer with Monte Carlo dropout, and multi-scale distillation loss was proposed to train the student network based on the ensemble of stochastic estimates. Experiments were conducted on the KITTI and Make3D datasets, and our proposed method achieved the state-of-the-art accuracy in self-supervised depth estimation. Our code is publicly available at https://github.com/ji-min-song/KD-of-MS-DPT.
format Online
Article
Text
id pubmed-10622578
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106225782023-11-04 Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation Song, Jimin Lee, Sang Jun Sci Rep Article Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estimation relieves the burden for collecting calibrated training data, while there is still a large performance gap between supervised and self-supervised methods. The objective of this study is to reduce the performance gap between the supervised and self-supervised approaches. The loss function of previous self-supervised methods is mainly based on a photometric error, which is indirectly computed from synthesized images using depth and pose estimates. In this paper, we argue that direct depth cue is more effective to train a depth estimation network. To obtain the direct depth cue, we employed a knowledge distillation technique, which is a teacher-student learning framework. The teacher network was trained in a self-supervised manner based on a photometric error, and its predictions were utilized to train a student network. We constructed a multi-scale dense prediction transformer with Monte Carlo dropout, and multi-scale distillation loss was proposed to train the student network based on the ensemble of stochastic estimates. Experiments were conducted on the KITTI and Make3D datasets, and our proposed method achieved the state-of-the-art accuracy in self-supervised depth estimation. Our code is publicly available at https://github.com/ji-min-song/KD-of-MS-DPT. Nature Publishing Group UK 2023-11-02 /pmc/articles/PMC10622578/ /pubmed/37919392 http://dx.doi.org/10.1038/s41598-023-46178-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Song, Jimin
Lee, Sang Jun
Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title_full Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title_fullStr Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title_full_unstemmed Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title_short Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
title_sort knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622578/
https://www.ncbi.nlm.nih.gov/pubmed/37919392
http://dx.doi.org/10.1038/s41598-023-46178-w
work_keys_str_mv AT songjimin knowledgedistillationofmultiscaledensepredictiontransformerforselfsuperviseddepthestimation
AT leesangjun knowledgedistillationofmultiscaledensepredictiontransformerforselfsuperviseddepthestimation