Department of Computer ScienceVrije Universiteit AmsterdamMar 16, 2026arXiv:2603.15951

Gaze-Aware Task Progression Detection Framework for Human-Robot Interaction Using RGB Cameras

Linlin Cheng, Koen Hindriks, Artem V. Belopolsky

AI Summary

This paper introduces a calibration-free framework for detecting task progression in human-robot interaction (HRI) using a robot's built-in RGB camera and state-of-the-art gaze estimation. The framework monitors user attention patterns by defining three Areas of Interest (AOI): tablet, robot face, and elsewhere, and uses shifts in gaze between these AOIs to infer task completion. Validated in a "First Day at Work" scenario, the system achieves 77.6% task completion detection accuracy and improves user comfort, social presence, and perceived naturalness compared to button-based interaction.

Key Contribution

Ditch the eye-tracking hardware: This RGB-camera-based gaze detection system lets robots intuitively understand task progression by simply watching where you look.

Abstract

In human-robot interaction (HRI), detecting a human's gaze helps robots interpret user attention and intent. However, most gaze detection approaches rely on specialized eye-tracking hardware, limiting deployment in everyday settings. Appearance-based gaze estimation methods remove this dependency by using standard RGB cameras, but their practicality in HRI remains underexplored. We present a calibration-free framework for detecting task progression when information is conveyed via integrated display interfaces. The framework uses only the robot's built-in monocular RGB camera (640x480 resolution) and state-of-the-art gaze estimation to monitor attention patterns. It leverages natural behavior, where users shift focus from task interfaces to the robot's face to signal task completion, formalized through three Areas of Interest (AOI): tablet, robot face, and elsewhere. Systematic parameter optimization identifies configurations that balance detection accuracy and interaction latency. We validate our framework in a"First Day at Work"scenario, comparing it to button-based interaction. Results show a task completion detection accuracy of 77.6%. Compared to button-based interaction, the proposed system exhibits slightly higher response latency but preserves information retention and significantly improves comfort, social presence, and perceived naturalness. Notably, most participants reported that they did not consciously use eye movements to guide the interaction, underscoring the intuitive role of gaze as a communicative cue. This work demonstrates the feasibility of intuitive, low-cost, RGB-only gaze-based HRI for natural and engaging interactions.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueIEEE Robotics and Automation Letters

Related Papers

Finding related papers...

Search

Gaze-Aware Task Progression Detection Framework for Human-Robot Interaction Using RGB Cameras

Related Papers