Cargando…

RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry

Unsupervised learning for monocular camera motion and 3D scene understanding has gained popularity over traditional methods, which rely on epipolar geometry or non-linear optimization. Notably, deep learning can overcome many issues of monocular vision, such as perceptual aliasing, low-textured area...

Descripción completa

Detalles Bibliográficos
Autores principales: Cimarelli, Claudio, Bavle, Hriday, Sanchez-Lopez, Jose Luis, Voos, Holger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003133/
https://www.ncbi.nlm.nih.gov/pubmed/35408264
http://dx.doi.org/10.3390/s22072651
_version_ 1784686058784423936
author Cimarelli, Claudio
Bavle, Hriday
Sanchez-Lopez, Jose Luis
Voos, Holger
author_facet Cimarelli, Claudio
Bavle, Hriday
Sanchez-Lopez, Jose Luis
Voos, Holger
author_sort Cimarelli, Claudio
collection PubMed
description Unsupervised learning for monocular camera motion and 3D scene understanding has gained popularity over traditional methods, which rely on epipolar geometry or non-linear optimization. Notably, deep learning can overcome many issues of monocular vision, such as perceptual aliasing, low-textured areas, scale drift, and degenerate motions. In addition, concerning supervised learning, we can fully leverage video stream data without the need for depth or motion labels. However, in this work, we note that rotational motion can limit the accuracy of the unsupervised pose networks more than the translational component. Therefore, we present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation (F2F) to adjust the rotation during training and online inference. To this end, we match 2D keypoints between consecutive frames using pre-trained deep networks, Superpoint and Superglue, while training a network for depth and pose estimation using an unsupervised training protocol. Then, we adjust the predicted rotation with the motion estimated by F2F using the 2D matches and initializing the solver with the pose network prediction. Ultimately, RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset, while reducing the complexity of other hybrid or traditional approaches and achieving comparable state-of-the-art results.
format Online
Article
Text
id pubmed-9003133
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90031332022-04-13 RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry Cimarelli, Claudio Bavle, Hriday Sanchez-Lopez, Jose Luis Voos, Holger Sensors (Basel) Article Unsupervised learning for monocular camera motion and 3D scene understanding has gained popularity over traditional methods, which rely on epipolar geometry or non-linear optimization. Notably, deep learning can overcome many issues of monocular vision, such as perceptual aliasing, low-textured areas, scale drift, and degenerate motions. In addition, concerning supervised learning, we can fully leverage video stream data without the need for depth or motion labels. However, in this work, we note that rotational motion can limit the accuracy of the unsupervised pose networks more than the translational component. Therefore, we present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation (F2F) to adjust the rotation during training and online inference. To this end, we match 2D keypoints between consecutive frames using pre-trained deep networks, Superpoint and Superglue, while training a network for depth and pose estimation using an unsupervised training protocol. Then, we adjust the predicted rotation with the motion estimated by F2F using the 2D matches and initializing the solver with the pose network prediction. Ultimately, RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset, while reducing the complexity of other hybrid or traditional approaches and achieving comparable state-of-the-art results. MDPI 2022-03-30 /pmc/articles/PMC9003133/ /pubmed/35408264 http://dx.doi.org/10.3390/s22072651 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cimarelli, Claudio
Bavle, Hriday
Sanchez-Lopez, Jose Luis
Voos, Holger
RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title_full RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title_fullStr RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title_full_unstemmed RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title_short RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
title_sort raum-vo: rotational adjusted unsupervised monocular visual odometry
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9003133/
https://www.ncbi.nlm.nih.gov/pubmed/35408264
http://dx.doi.org/10.3390/s22072651
work_keys_str_mv AT cimarelliclaudio raumvorotationaladjustedunsupervisedmonocularvisualodometry
AT bavlehriday raumvorotationaladjustedunsupervisedmonocularvisualodometry
AT sanchezlopezjoseluis raumvorotationaladjustedunsupervisedmonocularvisualodometry
AT voosholger raumvorotationaladjustedunsupervisedmonocularvisualodometry