Mar 30, 2026arXiv:2603.28750

Temporal Credit Is Free

AI Summary

The paper demonstrates that recurrent networks can adapt online using only immediate derivatives, eliminating the need for Jacobian propagation (RTRL). They show that the hidden state inherently carries temporal credit and that proper gradient scaling via normalization (β2 in RMSprop) is crucial when gradients pass through nonlinear state updates without output bypasses. Experiments across diverse architectures, primate neural data, and streaming benchmarks reveal that this approach matches or surpasses full RTRL performance with significantly reduced memory requirements (1000x less).

Key Contribution

Forget backpropagation through time: recurrent networks already have temporal credit baked into their forward pass.

Abstract

Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when normalization is needed: \b{eta}2 is required when gradients must pass through a nonlinear state update with no output bypass, and unnecessary otherwise. Across ten architectures, real primate neural data, and streaming ML benchmarks, immediate derivatives with RMSprop match or exceed full RTRL, scaling to n = 1024 at 1000x less memory.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Temporal Credit Is Free

Related Papers