Feb 17, 2026arXiv:2602.15593

A unified theory of feature learning in RNNs and DNNs

Jan P. Bauer, Kirsten Fischer, Moritz Helias, Agostina Palmigiano

AI Summary

This paper develops a unified mean-field theory for RNNs and DNNs operating in the feature learning ($\mu$P) regime, framing training as Bayesian inference over sequences and patterns. The theory reveals that RNNs and DNNs behave identically below a critical learning signal threshold, while above it, only RNNs develop correlated representations across timesteps. Furthermore, the weight sharing in RNNs induces an inductive bias that aids generalization in sequential tasks by interpolating unsupervised time steps.

Key Contribution

RNNs and DNNs behave identically until a phase transition where RNNs uniquely develop correlated representations across timesteps, offering a new perspective on their distinct functional properties.

Abstract

Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($μ$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, the RNNs' weight sharing furthermore induces an inductive bias that aids generalization by interpolating unsupervised time steps. Overall, our theory offers a way to connect architectural structure to functional biases.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A unified theory of feature learning in RNNs and DNNs

Related Papers