USTCMar 17, 2026arXiv:2603.16219

SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

Hang Lv, Sheng Liang, Yongyue Zhang, Defu Lian, Enhong Chen

AI Summary

The paper introduces SpecSteer, a collaborative inference framework that combines on-device personalized context with cloud-scale reasoning to improve personalized generation. SpecSteer uses Bayesian knowledge fusion and speculative decoding to create a Draft-Verify-Recover pipeline, where an on-device model drafts personalized sequences, the cloud validates them, and local intent is injected during recovery if needed. Experiments show that SpecSteer achieves better personalized generation performance and a 2.36x speedup compared to standard baselines.

Key Contribution

Achieve personalized generation with cloud-scale reasoning while preserving user privacy, thanks to a novel asymmetric collaboration framework that's also 2x faster.

Abstract

Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning capacity required for high-quality generation. Our pilot study shows that purely local enhancements remain insufficient to reliably bridge this gap. We therefore propose SpecSteer, an asymmetric collaborative inference framework that synergizes private on-device context with cloud-scale reasoning. SpecSteer casts collaboration as Bayesian knowledge fusion and repurposes speculative decoding as a distributed alignment protocol, yielding a Draft--Verify--Recover pipeline: the on-device model drafts personalized sequences; the cloud validates via a ratio-based mechanism that decouples reasoning verification from private context, filtering logical flaws without accessing raw user context; upon rejection, a steering recovery injects local intent during correction. Experiments demonstrate that SpecSteer successfully closes the reasoning gap and achieves superior personalized generation performance, while delivering a 2.36x speedup over standard baselines.

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

Related Papers