Search papers, labs, and topics across Lattice.
The paper introduces a new dataset, eNavi, for event-based indoor robot navigation, comprising synchronized event streams, RGB frames, and expert control actions collected in diverse lighting conditions. They develop a multimodal preprocessing pipeline for temporal alignment and action reconstruction, enabling high-quality imitation learning. The proposed RGB-Event fusion navigation policy, trained via behavioral cloning, demonstrates improved robustness and lower action prediction error, particularly in low-light scenarios where RGB-only models falter.
Event cameras can rescue robot navigation in low-light environments where RGB fails, as demonstrated by a new multimodal policy that leverages event data for robust imitation learning.
Event cameras provide high dynamic range and microsecond-level temporal resolution, making them well-suited for indoor robot navigation, where conventional RGB cameras degrade under fast motion or low-light conditions. Despite advances in event-based perception spanning detection, SLAM, and pose estimation, there remains limited research on end-to-end control policies that exploit the asynchronous nature of event streams. To address this gap, we introduce a real-world indoor person-following dataset collected using a TurtleBot 2 robot, featuring synchronized raw event streams, RGB frames, and expert control actions across multiple indoor maps, trajectories under both normal and low-light conditions. We further build a multimodal data preprocessing pipeline that temporally aligns event and RGB observations while reconstructing ground-truth actions from odometry to support high-quality imitation learning. Building on this dataset, we propose a late-fusion RGB-Event navigation policy that combines dual MobileNet encoders with a transformer-based fusion module trained via behavioral cloning. A systematic evaluation of RGB-only, Event-only, and RGB-Event fusion models across 12 training variations ranging from single-path imitation to general multi-path imitation shows that policies incorporating event data, particularly the fusion model, achieve improved robustness and lower action prediction error, especially in unseen low-light conditions where RGB-only models fail. We release the dataset, synchronization pipeline, and trained models at https://eventbasedvision.github.io/eNavi/