Cargando…

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cui, Xue-Zhi, Feng, Quan, Wang, Shu-Zhi, Zhang, Jian-Hua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8838921/ https://www.ncbi.nlm.nih.gov/pubmed/35161463 http://dx.doi.org/10.3390/s22030721

_version_	1784650240600571904
author	Cui, Xue-Zhi Feng, Quan Wang, Shu-Zhi Zhang, Jian-Hua
author_facet	Cui, Xue-Zhi Feng, Quan Wang, Shu-Zhi Zhang, Jian-Hua
author_sort	Cui, Xue-Zhi
collection	PubMed
description	To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks—the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks’ outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model’s performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV’s environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs.
format	Online Article Text
id	pubmed-8838921
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-88389212022-02-13 Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle Cui, Xue-Zhi Feng, Quan Wang, Shu-Zhi Zhang, Jian-Hua Sensors (Basel) Article To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks—the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks’ outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model’s performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV’s environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs. MDPI 2022-01-18 /pmc/articles/PMC8838921/ /pubmed/35161463 http://dx.doi.org/10.3390/s22030721 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Cui, Xue-Zhi Feng, Quan Wang, Shu-Zhi Zhang, Jian-Hua Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_full	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_fullStr	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_full_unstemmed	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_short	Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle
title_sort	monocular depth estimation with self-supervised learning for vineyard unmanned agricultural vehicle
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8838921/ https://www.ncbi.nlm.nih.gov/pubmed/35161463 http://dx.doi.org/10.3390/s22030721
work_keys_str_mv	AT cuixuezhi monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT fengquan monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT wangshuzhi monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle AT zhangjianhua monoculardepthestimationwithselfsupervisedlearningforvineyardunmannedagriculturalvehicle

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

Ejemplares similares