Feb 19, 2026arXiv:2602.17171

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

Ayush Goel, Arjun Kohli, Sarvagya Somvanshi

AI Summary

This paper empirically compares the in-context learning (ICL) performance of linear and quadratic attention mechanisms on linear regression tasks, focusing on learning quality, convergence, and generalization. The study reveals both similarities and limitations of linear attention compared to quadratic attention in ICL. Increasing model depth is also analyzed for its effect on ICL performance in both architectures.

Key Contribution

Linear attention models can mimic quadratic attention in-context learning for simple tasks like linear regression, but with limitations that this paper elucidates.

Abstract

Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting.

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

Related Papers