Search papers, labs, and topics across Lattice.
The paper demonstrates that recurrent networks can adapt online using only immediate derivatives, eliminating the need for Jacobian propagation (RTRL). They show that the hidden state inherently carries temporal credit and that proper gradient scaling via normalization (尾2 in RMSprop) is crucial when gradients pass through nonlinear state updates without output bypasses. Experiments across diverse architectures, primate neural data, and streaming benchmarks reveal that this approach matches or surpasses full RTRL performance with significantly reduced memory requirements (1000x less).
Forget backpropagation through time: recurrent networks already have temporal credit baked into their forward pass.
Recurrent networks do not need Jacobian propagation to adapt online. The hidden state already carries temporal credit through the forward pass; immediate derivatives suffice if you stop corrupting them with stale trace memory and normalize gradient scales across parameter groups. An architectural rule predicts when normalization is needed: \b{eta}2 is required when gradients must pass through a nonlinear state update with no output bypass, and unnecessary otherwise. Across ten architectures, real primate neural data, and streaming ML benchmarks, immediate derivatives with RMSprop match or exceed full RTRL, scaling to n = 1024 at 1000x less memory.