NJUNorthwesternPKUMar 2, 2026arXiv:2603.01661

HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC

Maoliang Li, Jiayu Chen, Zihao Zheng, Ziqian Li, Ziqiang Li, Guojie Luo, Chenchen Liu, Chenchen Liu

AI Summary

The paper introduces HeRo, a framework designed to optimize the deployment of agentic retrieval-augmented generation (RAG) workflows on heterogeneous mobile System-on-Chips (SoCs). HeRo uses profiling-based performance models to capture latency, workload shape, and contention-induced slowdowns for different sub-stages and model-PU configurations. By integrating shape-aware sub-stage partitioning, criticality-based accelerator mapping, and bandwidth-aware concurrency control within a lightweight online scheduler, HeRo achieves significant latency reductions in end-to-end RAG execution.

Key Contribution

Achieve up to 10.94x speedup in end-to-end latency for on-device agentic RAG by intelligently scheduling tasks across heterogeneous mobile SoC hardware.

Abstract

With the increasing computational capability of mobile devices, deploying agentic retrieval-augmented generation (RAG) locally on heterogeneous System-on-Chips (SoCs) has become a promising way to enhance LLM-based applications. However, agentic RAG induces multi-stage workflows with heterogeneous models and dynamic execution flow, while mobile SoCs exhibit strong accelerator affinity, shape sensitivity, and shared-memory bandwidth contention, making naive scheduling ineffective. We present HeRo, a heterogeneous-aware framework for low-latency agentic RAG on mobile SoCs. HeRo builds profiling-based performance models for each sub-stage and model-PU configuration, capturing latency, workload shape, and contention-induced slowdown, and leverages them in a lightweight online scheduler that combines shape-aware sub-stage partitioning, criticality-based accelerator mapping, and bandwidth-aware concurrency control. Experiments on commercial mobile devices show that HeRo reduces end-to-end latency by up to $10.94\times$ over existing deployment strategies, enabling practical on-device agentic RAG.

Distributed Systems & Hardware Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC

Related Papers