Feb 18, 2026arXiv:2602.16704

Reinforced Fast Weights with Next-Sequence Prediction

Hee Seung Hwang, Xindi Wu, Sanghyuk Chun, Sanghyuk Chun, Olga Russakovsky, Olga Russakovsky

AI Summary

The paper addresses the limitation of next-token prediction (NTP) in training fast weight architectures, which hinders their ability to capture long-range dependencies due to a lack of semantic coherence across multiple tokens. To overcome this, they introduce REFINE, a reinforcement learning framework that trains fast weight models using a next-sequence prediction (NSP) objective. REFINE leverages prediction entropy to select informative tokens, generates multi-token rollouts, and optimizes the model with group relative policy optimization (GRPO), demonstrating improved performance on long-context tasks compared to NTP-based fine-tuning.

Key Contribution

Fast weight models can now achieve significantly better long-context performance thanks to a new RL training framework that moves beyond next-token prediction.

Abstract

Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References62

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reinforced Fast Weights with Next-Sequence Prediction

Related Papers