Simon Bultmann, Jan Quenzel, and Sven Behnke:
Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles
In Proceedings of 10th European Conference on Mobile Robots (ECMR), Bonn, Germany, September 2021.


Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with ≈ 9 Hz. We evaluate the integrated system in realworld experiments in an urban environment.

Video Teaser


Presentation Video


