Search papers, labs, and topics across Lattice.
This paper introduces a dual-mode human-robot interaction (HRI) method leveraging large language models (LLMs) to overcome the limitations of single-mode interaction and predefined content in current guide robot systems. The proposed method integrates a proactive interaction module using real-time sensor data for human-like services and a reactive interaction module employing a query router with retrieval-augmented generation (RAG) for adaptive and efficient responses. Experimental results in guided tour scenarios demonstrate a 92% F1-score, improved response latency, and higher Likert-scale ratings for naturalness, intelligence, dependability, and stimulation compared to baseline methods.
Guide robots get a personality upgrade: a new dual-mode interaction method powered by LLMs boosts F1-scores by 8% and slashes response latency by nearly 50% compared to standard RAG.
Current guide robot systems have two main issues: (1) they only support a single mode of interaction (proactive or reactive) and lack a coordination mechanism and (2) they rely heavily on predefined content, which hinders the realisation of a natural and flexible human‐like interaction experience. To address these issues, this paper proposes a dual‐mode human–robot interaction (HRI) method based on a large language model (LLM). This method includes the following: (1) proactive interaction module. This module uses the robot's own sensors to perceive environmental information in real time, enabling it to provide various human‐like services, such as safety alerts, situational announcements, and personalised recommendations. (2) Reactive interaction module. This integrates a query router with retrieval‐augmented generation (RAG) method to build an adaptive response mechanism, which aims to provide more accurate responses while optimising response efficiency. Validation in guided tour scenarios confirms the efficiency of the proposed method. Results demonstrate that the proposed method achieves a 92% F 1‐score (improving 8 percentage points [PPs] over pure LLM and 6 PPs over traditional RAG), has a 48.4% improvement in response latency compared to the standard retrieval‐cosine method (the fastest baseline among static RAG approaches) and achieves higher Likert‐scale ratings in naturalness (4.35), intelligence (4.05), dependability (4.48) and stimulation (4.45) than other evaluated methods. This study proposes a scalable technical pathway for advancing human–robot interaction systems towards more natural and anthropomorphic interaction paradigms.