CU BoulderApr 6, 2026arXiv:2604.04855

The Role of Generator Access in Autoregressive Post-Training

AI Summary

This paper investigates the impact of generator access on autoregressive post-training, specifically focusing on the ability to query the next-token rule from previously built prefixes versus being confined to fresh root-start rollouts. They show that root-start training is fundamentally limited by the on-policy probability of reaching informative prefixes, while weak prefix control overcomes this limitation. The study reveals an exponential gap in KL-regularized outcome-reward post-training performance simply by altering the generator interface.

Key Contribution

Seemingly minor restrictions on generator access during post-training can create exponential gaps in performance, suggesting that the interface between learner and generator is a critical, often overlooked, factor.

Abstract

We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top-$k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top-$1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Role of Generator Access in Autoregressive Post-Training

Related Papers