Search papers, labs, and topics across Lattice.
The paper introduces ROSA2, a framework for test-time policy adaptation in LLMs that jointly optimizes prompts (words) and model weights to improve multi-turn interactions. ROSA2 decomposes the error signal into textual gradients for intent rectification and parameter updates for capability enhancement, theoretically proving that this co-adaptation reduces the parameter shift needed for convergence. Experiments on the MATH dataset show ROSA2 outperforms baselines by 30% while reducing interaction turns by 40%, indicating that clarifying context significantly enhances the effectiveness of parameter updates.
LLMs learn faster and perform better when you optimize prompts and weights together, boosting performance by 30% and cutting interaction turns by 40%.
Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.