Department of MathematicsGeorgia TechPurdueSchool of MathematicsMay 6, 2026arXiv:2605.05176

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai

AI Summary

This paper theoretically analyzes in-context learning (ICL) for nonlinear regression using transformers, focusing on how attention mechanisms can be explicitly constructed to realize nonlinear features like polynomial or spline bases. The authors establish a framework to analyze end-to-end ICL performance, deriving finite-sample generalization error bounds based on context length and training set size. Numerical experiments on synthetic regression tasks validate the theoretical findings, providing insights into the capabilities and limitations of transformers in nonlinear ICL.

Key Contribution

Transformers can be explicitly designed to perform nonlinear regression in-context by leveraging attention as a featurizer, offering a theoretical understanding of how these models learn complex relationships from prompts.

Abstract

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scaling Laws & Emergent Abilities

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Related Papers