Search papers, labs, and topics across Lattice.
This paper investigates whether hybrid architectures combining attention-based retrieval with recurrent state updates outperform attention-only models on tasks requiring both recall and state-tracking, two key reasoning primitives. Using matched Olmo3 transformer and hybrid models, the authors find that reasoning augmentation significantly improves performance across tasks, extending the difficulty range where models remain effective. Furthermore, hybrid models exhibit greater robustness than transformers as sequential dependencies increase, suggesting architectural inductive biases are crucial for persistent state propagation.
Hybrid architectures that combine attention and recurrence can maintain reasoning performance as task complexity increases, while transformers see a sharp performance drop-off.
Reasoning in large language models is often treated as a monolithic capability, but its observed gains may arise from more basic operations. We study reasoning through two such primitives, recall and state-tracking, and ask whether hybrid architectures that combine attention-based retrieval with recurrent state updates are better suited than attention-only models for tasks that jointly require both. Using matched Olmo3 transformer and hybrid models in instruction-tuned and reasoning-augmented variants, we evaluate these models on a set of controlled tasks involving a mixture of state-tracking and recall primitives, state-based recall. Across tasks, we notice that reasoning augmentation provides the largest overall improvement, substantially extending the range of difficulty over which models remain effective. We also notice that in certain tasks, the hybrid reasoning model remains substantially more robust as sequential dependence increases. In contrast, the transformer reasoning model degrades sharply in performance as task difficulty increases beyond a given threshold. These results suggest that reasoning tokens and architectural inductive biases contribute at different levels of the computational process: explicit reasoning can expand a model's effective operating range, but its benefit depends on how well the underlying architecture supports persistent state propagation. Given the small size of our case study, which involves a limited set of models and tasks, we present these findings as suggestive rather than conclusive and leave broader validation across model families, scales, and task variations to future work.