Tsinghua AIAirbnbCambridgeDartmouthUCLJun 16, 2026arXiv:2606.18147

WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning

Yuwei Zhang, Tong Xia, Bianca Emmerich, Yu Yvonne Wu, Dimitris Spathis, Xin Liu, Daniel McDuff, Cecilia Mascolo

AI Summary

This paper introduces WEQA, a query-adaptive agent framework that integrates large language model (LLM) reasoning with specialized analytical tools for wearable health data. By employing an LLM controller to dynamically route queries and synthesize execution plans, WEQA effectively addresses the challenges posed by the high-dimensional and diverse nature of wearable sensor data. Experimental results demonstrate a 24% improvement in accuracy over existing LLM and agentic baselines, alongside significant enhancements in perceived usefulness and clinical soundness as validated by medical experts and users.

Key Contribution

WEQA achieves a 24% accuracy boost in wearable health question answering by dynamically adapting to the complexities of sensor data and user queries.

Abstract

Language models are remarkably capable at medical question answering, in some cases surpassing the accuracy of general physicians. However, answering questions about wearable health data remains challenging and understudied, as these ubiquitous sensors produce continuous, high-dimensional, and longitudinal data, which is non-trivial to align with text-centric distributions in LLM pretraining. The diversity of sensor modalities and user intents cannot be effectively handled by a fixed reasoning workflow or a single pretrained foundation model. To address these challenges, we propose WEQA, a query-adaptive agent framework that unifies LLM reasoning with specialized wearable analytical and modeling tools. An LLM controller is employed to synthesize execution plans and dynamically route each query to the appropriate combination of sensor analysis and pretrained models, and perform grounded response auditing with external knowledge. We also curate a benchmark spanning four open wearable datasets comprising analytic and predictive tasks in three different health domains. Experiments show that our framework is 24% more accurate than LLM and agentic baselines, and a blinded study with 12 medical experts and 8 users shows substantial gains in usefulness and clinical soundness.

Multimodal Models Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning

Related Papers