Search papers, labs, and topics across Lattice.
AnyUser is a robotic instruction system that allows users to control domestic robots using free-form sketches on camera images, optionally combined with language. It fuses multimodal inputs into spatial-semantic primitives to generate robot actions without requiring prior maps or models, using a novel multimodal fusion module and a hierarchical policy. Evaluations on a large dataset, real-world robotic platforms, and a user study demonstrate the system's accuracy, reliability, and usability across diverse demographics.
Control your robot with sketches: AnyUser lets anyone command robots in the real world with intuitive free-form drawings, no coding required.
We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior maps or models. Novel components include multimodal fusion for understanding and a hierarchical policy for robust action generation. Efficacy is shown via extensive evaluations: (1) Quantitative benchmarks on the large-scale dataset showing high accuracy in interpreting diverse sketch-based commands across various simulated domestic scenes. (2) Real-world validation on two distinct robotic platforms, a statically mounted 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL), performing representative tasks like targeted wiping and area cleaning, confirming the system's ability to ground instructions and execute them reliably in physical environments. (3) A comprehensive user study involving diverse demographics (elderly, simulated non-verbal, low technical literacy) demonstrating significant improvements in usability and task specification efficiency, achieving high task completion rates (85.7%-96.4%) and user satisfaction. AnyUser bridges the gap between advanced robotic capabilities and the need for accessible non-expert interaction, laying the foundation for practical assistive robots adaptable to real-world human environments.