Cargando…
Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation
Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estima...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622578/ https://www.ncbi.nlm.nih.gov/pubmed/37919392 http://dx.doi.org/10.1038/s41598-023-46178-w |
_version_ | 1785130570651533312 |
---|---|
author | Song, Jimin Lee, Sang Jun |
author_facet | Song, Jimin Lee, Sang Jun |
author_sort | Song, Jimin |
collection | PubMed |
description | Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estimation relieves the burden for collecting calibrated training data, while there is still a large performance gap between supervised and self-supervised methods. The objective of this study is to reduce the performance gap between the supervised and self-supervised approaches. The loss function of previous self-supervised methods is mainly based on a photometric error, which is indirectly computed from synthesized images using depth and pose estimates. In this paper, we argue that direct depth cue is more effective to train a depth estimation network. To obtain the direct depth cue, we employed a knowledge distillation technique, which is a teacher-student learning framework. The teacher network was trained in a self-supervised manner based on a photometric error, and its predictions were utilized to train a student network. We constructed a multi-scale dense prediction transformer with Monte Carlo dropout, and multi-scale distillation loss was proposed to train the student network based on the ensemble of stochastic estimates. Experiments were conducted on the KITTI and Make3D datasets, and our proposed method achieved the state-of-the-art accuracy in self-supervised depth estimation. Our code is publicly available at https://github.com/ji-min-song/KD-of-MS-DPT. |
format | Online Article Text |
id | pubmed-10622578 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106225782023-11-04 Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation Song, Jimin Lee, Sang Jun Sci Rep Article Depth estimation is an inverse projection problem that estimates pixel-level distances from a single image. Although, supervised methods have shown promising results, it has intrinsic limitations in requiring ground truth depth from an external sensor. On the other hand, self-supervised depth estimation relieves the burden for collecting calibrated training data, while there is still a large performance gap between supervised and self-supervised methods. The objective of this study is to reduce the performance gap between the supervised and self-supervised approaches. The loss function of previous self-supervised methods is mainly based on a photometric error, which is indirectly computed from synthesized images using depth and pose estimates. In this paper, we argue that direct depth cue is more effective to train a depth estimation network. To obtain the direct depth cue, we employed a knowledge distillation technique, which is a teacher-student learning framework. The teacher network was trained in a self-supervised manner based on a photometric error, and its predictions were utilized to train a student network. We constructed a multi-scale dense prediction transformer with Monte Carlo dropout, and multi-scale distillation loss was proposed to train the student network based on the ensemble of stochastic estimates. Experiments were conducted on the KITTI and Make3D datasets, and our proposed method achieved the state-of-the-art accuracy in self-supervised depth estimation. Our code is publicly available at https://github.com/ji-min-song/KD-of-MS-DPT. Nature Publishing Group UK 2023-11-02 /pmc/articles/PMC10622578/ /pubmed/37919392 http://dx.doi.org/10.1038/s41598-023-46178-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Song, Jimin Lee, Sang Jun Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title | Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title_full | Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title_fullStr | Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title_full_unstemmed | Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title_short | Knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
title_sort | knowledge distillation of multi-scale dense prediction transformer for self-supervised depth estimation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622578/ https://www.ncbi.nlm.nih.gov/pubmed/37919392 http://dx.doi.org/10.1038/s41598-023-46178-w |
work_keys_str_mv | AT songjimin knowledgedistillationofmultiscaledensepredictiontransformerforselfsuperviseddepthestimation AT leesangjun knowledgedistillationofmultiscaledensepredictiontransformerforselfsuperviseddepthestimation |