Cargando…

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy

The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Tao, Wang, Ke, Li, Ruifeng, Tang, Xinyue
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730972/ https://www.ncbi.nlm.nih.gov/pubmed/33291774 http://dx.doi.org/10.3390/s20236943

_version_	1783621808240459776
author	Xie, Tao Wang, Ke Li, Ruifeng Tang, Xinyue
author_facet	Xie, Tao Wang, Ke Li, Ruifeng Tang, Xinyue
author_sort	Xie, Tao
collection	PubMed
description	The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51 [Formula: see text] accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83 [Formula: see text] accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS.
format	Online Article Text
id	pubmed-7730972
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77309722020-12-12 Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy Xie, Tao Wang, Ke Li, Ruifeng Tang, Xinyue Sensors (Basel) Article The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51 [Formula: see text] accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83 [Formula: see text] accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS. MDPI 2020-12-04 /pmc/articles/PMC7730972/ /pubmed/33291774 http://dx.doi.org/10.3390/s20236943 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xie, Tao Wang, Ke Li, Ruifeng Tang, Xinyue Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title	Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title_full	Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title_fullStr	Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title_full_unstemmed	Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title_short	Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy
title_sort	visual robot relocalization based on multi-task cnn and image-similarity strategy
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730972/ https://www.ncbi.nlm.nih.gov/pubmed/33291774 http://dx.doi.org/10.3390/s20236943
work_keys_str_mv	AT xietao visualrobotrelocalizationbasedonmultitaskcnnandimagesimilaritystrategy AT wangke visualrobotrelocalizationbasedonmultitaskcnnandimagesimilaritystrategy AT liruifeng visualrobotrelocalizationbasedonmultitaskcnnandimagesimilaritystrategy AT tangxinyue visualrobotrelocalizationbasedonmultitaskcnnandimagesimilaritystrategy

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy

Ejemplares similares