Search papers, labs, and topics across Lattice.
The paper introduces HiSync, a framework for Command Source Identification (CSI) in long-range Human-Robot Interaction (HRI) that fuses robot-mounted camera optical flow with hand-worn IMU signals. HiSync extracts frequency-domain hand motion features from both modalities, uses a learned CSINet to denoise IMU readings and temporally align modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity. Evaluated in three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, significantly outperforming the state-of-the-art.
Achieve 92% accuracy in identifying who's commanding a robot from 34 meters away by fusing IMU and camera data, a 48% leap over prior art.
Long-range Human-Robot Interaction (HRI) remains underexplored. Within it, Command Source Identification (CSI) - determining who issued a command - is especially challenging due to multi-user and distance-induced sensor ambiguity. We introduce HiSync, an optical-inertial fusion framework that treats hand motion as binding cues by aligning robot-mounted camera optical flow with hand-worn IMU signals. We first elicit a user-defined (N=12) gesture set and collect a multimodal command gesture dataset (N=38) in long-range multi-user HRI scenarios. Next, HiSync extracts frequency-domain hand motion features from both camera and IMU data, and a learned CSINet denoises IMU readings, temporally aligns modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity of subtle, natural gestures, enabling robust CSI. In three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, outperforming the prior SOTA by 48.44%. HiSync is also validated on real-robot deployment. By making CSI reliable and natural, HiSync provides a practical primitive and design guidance for public-space HRI.