Search papers, labs, and topics across Lattice.
This study investigates how prompt-based interventions can alter model behavior by analyzing state representation in controlled routing tasks. Using GPT-2 models, the authors reveal that fixed-interface reuse leads to significant transfer of routing accuracy without retraining, while trainable prompts require additional support to relearn behaviors. The findings highlight the distinction between fixed-interface reuse and prompt relocation, providing insights into model behavior that could inform future AI system designs.
Fixed-interface transfer can achieve high routing accuracy without retraining, revealing deeper insights into model behavior than previously understood.
Prompt-based interventions can change model behavior, but trained success alone does not identify where the behaviorally relevant state is represented. We study this question in controlled routing tasks using interfaces chosen on support data, held-out query evaluation, and matched necessity, sufficiency, and wrong-interface controls. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. These results distinguish fixed-interface reuse from prompt relocation in a setting where the two can be tested directly. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches are used to map scope: they show broader transport or weaker controller identifiability once control depends on longer trajectories or harder selection. In controlled routing, fixed-interface transfer is therefore stronger evidence of reuse than trained prompt success alone.