Feb 17, 2026arXiv:2602.15707

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

Rehana Mahfuz, Yinyi Guo, Erik Visser, Phanidhar Chinchili

AI Summary

This paper introduces a real-time conversational assistant for procedural tasks that relies solely on audio and IMU data, addressing privacy concerns and computational costs associated with video-based systems. They construct a dataset of assistant-guided furniture assembly conversations and propose a User Whim Agnostic (UWA) LoRA finetuning method to improve the assistant's ability to provide concise and relevant instructions. The UWA LoRA finetuning achieves a >30% F-score improvement and a 16x speedup compared to using in-context examples.

Key Contribution

You can now build a real-time, privacy-preserving conversational assistant for procedural tasks using *only* audio and IMU data, thanks to a new finetuning method that makes the assistant less chatty and more helpful.

Abstract

Real-time conversational assistants for procedural tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides comprehensive guidance for a procedural task using only lightweight privacy-preserving modalities such as audio and IMU inputs from a user's wearable device to understand the context. This assistant proactively communicates step-by-step instructions to a user performing a furniture assembly task, and answers user questions. We construct a dataset containing conversations where the assistant guides the user in performing the task. On observing that an off-the-shelf language model is a very talkative assistant, we design a novel User Whim Agnostic (UWA) LoRA finetuning method which improves the model's ability to suppress less informative dialogues, while maintaining its tendency to communicate important instructions. This leads to >30% improvement in the F-score. Finetuning the model also results in a 16x speedup by eliminating the need to provide in-context examples in the prompt. We further describe how such an assistant is implemented on edge devices with no dependence on the cloud.

Robotics & Embodied AI Speech & Audio Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

Related Papers