Cargando…

Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers

The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matchin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Yi, Xu, Xintao, Xiang, Bajian, Chen, Gang, Gong, Guoliang, Lu, Huaxiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570544/ https://www.ncbi.nlm.nih.gov/pubmed/36236675 http://dx.doi.org/10.3390/s22197577

_version_	1784810137316229120
author	Liu, Yi Xu, Xintao Xiang, Bajian Chen, Gang Gong, Guoliang Lu, Huaxiang
author_facet	Liu, Yi Xu, Xintao Xiang, Bajian Chen, Gang Gong, Guoliang Lu, Huaxiang
author_sort	Liu, Yi
collection	PubMed
description	The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching process lacks constraints on occlusion and matching uniqueness; Also, as a local feature extractor, a convolutional neural network lacks the ability of global context information perception. Aiming at the problems in the matching method of constructing matching cost volume, we propose a disparity prediction algorithm based on Transformer, which specifically comprises the Swin-SPP module for feature extraction based on Swin Transformer, Transformer disparity matching network based on self-attention and cross-attention mechanism, and occlusion prediction sub-network. In addition, we propose a double skip connection fully connected layer to solve the problems of gradient vanishing and explosion during the training process for the Transformer model, thus further enhancing inference accuracy. The proposed model in this paper achieved an EPE (Absolute error) of 0.57 and 0.61, and a 3PE (Percentage error greater than 3 px) of 1.74% and 1.56% on KITTI 2012 and KITTI 2015 datasets, respectively, with an inference time of 0.46 s and parameters as low as only 2.6 M, showing great advantages compared with other algorithms in various evaluation metrics.
format	Online Article Text
id	pubmed-9570544
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-95705442022-10-17 Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers Liu, Yi Xu, Xintao Xiang, Bajian Chen, Gang Gong, Guoliang Lu, Huaxiang Sensors (Basel) Article The depth estimation algorithm based on the convolutional neural network has many limitations and defects by constructing matching cost volume to calculate the disparity: using a limited disparity range, the authentic disparity beyond the predetermined range can not be acquired; Besides, the matching process lacks constraints on occlusion and matching uniqueness; Also, as a local feature extractor, a convolutional neural network lacks the ability of global context information perception. Aiming at the problems in the matching method of constructing matching cost volume, we propose a disparity prediction algorithm based on Transformer, which specifically comprises the Swin-SPP module for feature extraction based on Swin Transformer, Transformer disparity matching network based on self-attention and cross-attention mechanism, and occlusion prediction sub-network. In addition, we propose a double skip connection fully connected layer to solve the problems of gradient vanishing and explosion during the training process for the Transformer model, thus further enhancing inference accuracy. The proposed model in this paper achieved an EPE (Absolute error) of 0.57 and 0.61, and a 3PE (Percentage error greater than 3 px) of 1.74% and 1.56% on KITTI 2012 and KITTI 2015 datasets, respectively, with an inference time of 0.46 s and parameters as low as only 2.6 M, showing great advantages compared with other algorithms in various evaluation metrics. MDPI 2022-10-06 /pmc/articles/PMC9570544/ /pubmed/36236675 http://dx.doi.org/10.3390/s22197577 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Liu, Yi Xu, Xintao Xiang, Bajian Chen, Gang Gong, Guoliang Lu, Huaxiang Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title	Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title_full	Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title_fullStr	Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title_full_unstemmed	Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title_short	Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers
title_sort	transformer based binocular disparity prediction with occlusion predict and novel full connection layers
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570544/ https://www.ncbi.nlm.nih.gov/pubmed/36236675 http://dx.doi.org/10.3390/s22197577
work_keys_str_mv	AT liuyi transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers AT xuxintao transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers AT xiangbajian transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers AT chengang transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers AT gongguoliang transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers AT luhuaxiang transformerbasedbinoculardisparitypredictionwithocclusionpredictandnovelfullconnectionlayers

Transformer Based Binocular Disparity Prediction with Occlusion Predict and Novel Full Connection Layers

Ejemplares similares