Feb 26, 2026arXiv:2602.23197

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Chungpa Lee, Jy-yong Sohn, Jy-yong Sohn, Kangwook Lee

AI Summary

This paper theoretically analyzes the impact of fine-tuning on the in-context learning abilities of linear attention models. It demonstrates that fine-tuning all attention parameters degrades in-context learning, while fine-tuning only the value matrix preserves it while improving zero-shot performance. The analysis also shows that adding a few-shot loss during fine-tuning enhances in-context learning on the target task but diminishes it on unseen tasks.

Key Contribution

Fine-tuning LLMs can kill their in-context learning abilities, but this work identifies a simple fix: only update the value matrix.

Abstract

Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We further show that incorporating an auxiliary few-shot loss enhances in-context learning primarily on the target task, at the expense of degraded in-context learning ability on tasks not seen during fine-tuning. We empirically validate our theoretical results.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

Related Papers