Cargando…

End-to-End Monocular Range Estimation for Forward Collision Warning

Estimating range to the closest object in front is the core component of the forward collision warning (FCW) system. Previous monocular range estimation methods mostly involve two sequential steps of object detection and range estimation. As a result, they are only effective for objects from specifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Jie, Li, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7589406/
https://www.ncbi.nlm.nih.gov/pubmed/33096656
http://dx.doi.org/10.3390/s20205941
Descripción
Sumario:Estimating range to the closest object in front is the core component of the forward collision warning (FCW) system. Previous monocular range estimation methods mostly involve two sequential steps of object detection and range estimation. As a result, they are only effective for objects from specific categories relying on expensive object-level annotation for training, but not for unseen categories. In this paper, we present an end-to-end deep learning architecture to solve the above problems. Specifically, we represent the target range as a weighted sum of a set of potential distances. These potential distances are generated by inverse perspective projection based on intrinsic and extrinsic camera parameters, while a deep neural network predicts the corresponding weights of these distances. The whole architecture is optimized towards the range estimation task directly in an end-to-end manner with only the target range as supervision. As object category is not restricted in the training stage, the proposed method can generalize to objects with unseen categories. Furthermore, camera parameters are explicitly considered in the proposed method, making it able to generalize to images taken with different cameras and novel views. Additionally, the proposed method is not a pure black box, but provides partial interpretability by visualizing the produced weights to see which part of the image dominates the final result. We conduct experiments to verify the above properties of the proposed method on synthetic and real-world collected data.