About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Aditya Prakash
- sl:arxiv_num : 2104.09224
- sl:arxiv_published : 2021-04-19T11:48:13Z
- sl:arxiv_summary : How should representations from complementary sensors be integrated for
autonomous driving? Geometry-based sensor fusion has shown great promise for
perception tasks such as object detection and motion forecasting. However, for
the actual driving task, the global context of the 3D scene is key, e.g. a
change in traffic light state can affect the behavior of a vehicle
geometrically distant from that traffic light. Geometry alone may therefore be
insufficient for effectively fusing representations in end-to-end driving
models. In this work, we demonstrate that imitation learning policies based on
existing sensor fusion methods under-perform in the presence of a high density
of dynamic agents and complex scenarios, which require global contextual
reasoning, such as handling traffic oncoming from multiple directions at
uncontrolled intersections. Therefore, we propose TransFuser, a novel
Multi-Modal Fusion Transformer, to integrate image and LiDAR representations
using attention. We experimentally validate the efficacy of our approach in
urban settings involving complex scenarios using the CARLA urban driving
simulator. Our approach achieves state-of-the-art driving performance while
reducing collisions by 76% compared to geometry-based fusion.@en
- sl:arxiv_title : Multi-Modal Fusion Transformer for End-to-End Autonomous Driving@en
- sl:arxiv_updated : 2021-04-19T11:48:13Z
- sl:bookmarkOf : https://arxiv.org/abs/2104.09224
- sl:creationDate : 2022-09-16
- sl:creationTime : 2022-09-16T19:03:51Z
Documents with similar tags (experimental)