Feb 25, 2026arXiv:2602.21454

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Alexander Morgan, Ummay Sumaya Khan, Lingjia Liu, Lizhong Zheng

AI Summary

The paper investigates why learning recurrent poles in RNNs via backpropagation is often ineffective in data-constrained, real-time learning scenarios. Through analysis, they show that learning recurrent poles introduces significant non-convexity to the weight optimization problem, requiring more data and iterations for convergence. Empirically, they demonstrate that fixed-pole RNNs achieve better performance with lower training complexity compared to RNNs with learnable poles, especially with complex-valued data.

Key Contribution

Learning recurrent dynamics in RNNs can hurt performance in real-time settings: fixing the poles leads to faster convergence and better results when data is limited.

Abstract

Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters including pole locations can be optimized via backpropagation through time (BPTT), such joint learning incurs substantial computational overhead and is often impractical for applications with limited training data. Echo state networks (ESNs) mitigate this limitation by fixing the recurrent dynamics and training only a linear readout, enabling efficient and stable online adaptation. In this work, we analytically and empirically examine why learning recurrent poles does not provide tangible benefits in data-constrained, real-time learning scenarios. Our analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions. Empirically, we observe that for complex-valued data, gradient descent frequently exhibits prolonged plateaus, and advanced optimizers offer limited improvement. In contrast, fixed-pole architectures induce stable and well-conditioned state representations even with limited training data. Numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Related Papers