Tsinghua AIECNUHebei University of Science and TechnologyUniversity of Nottingham NingboMar 1, 2026arXiv:2603.01059

GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Zhuokang Shen, Yifan Wang, Hanyuan Chen, Wenxuan Huang, Shaohui Lin

AI Summary

GroupGPT is introduced as a token-efficient and privacy-preserving framework for multi-user chat assistants, addressing the limitations of existing LLM-based systems in complex, evolving group chat contexts. It employs a small-large model collaborative architecture to decouple intervention timing from response generation, enhancing efficiency and accuracy. Evaluated on the new MUIR benchmark, GroupGPT achieves high response quality (4.72/5.0) and reduces token usage by up to 3x while providing privacy sanitization.

Key Contribution

Get 3x more bang for your buck in multi-user LLM chat applications with GroupGPT, a framework that slashes token usage while preserving privacy.

Abstract

Recent advances in large language models (LLMs) have enabled increasingly capable chatbots. However, most existing systems focus on single-user settings and do not generalize well to multi-user group chats, where agents require more proactive and accurate intervention under complex, evolving contexts. Existing approaches typically rely on LLMs for both reasoning and generation, leading to high token consumption, limited scalability, and potential privacy risks. To address these challenges, we propose GroupGPT, a token-efficient and privacy-preserving agentic framework for multi-user chat assistant. GroupGPT adopts a small-large model collaborative architecture to decouple intervention timing from response generation, enabling efficient and accurate decision-making. The framework also supports multimodal inputs, including memes, images, videos, and voice messages. We further introduce MUIR, a benchmark dataset for multi-user chat assistant intervention reasoning. MUIR contains 2,500 annotated group chat segments with intervention labels and rationales, supporting evaluation of timing accuracy and response quality. We evaluate a range of models on MUIR, from large language models to smaller counterparts. Extensive experiments demonstrate that GroupGPT produces accurate and well-timed responses, achieving an average score of 4.72/5.0 in LLM-based evaluation, and is well received by users across diverse group chat scenarios. Moreover, GroupGPT reduces token usage by up to 3 times compared to baseline methods, while providing privacy sanitization of user messages before cloud transmission. Code is available at: https://github.com/Eliot-Shen/GroupGPT .

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Related Papers