Anhui Province Key Laboratory of DigitalUSTCApr 23, 2026arXiv:2604.21255

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

Chen Yang, Yuning Zhang, Zhoufutu Wen, Tao Gong, Jiaheng Liu, Qizhi Chu, Nenghai Yu

AI Summary

This paper introduces Response Pattern Similarity (RPS) and Action Graph Similarity (AGS) to quantify non-mandatory behavioral similarity in LLM agents, addressing the limitations of existing metrics in distinguishing task-required behaviors from model-specific preferences. Applying these metrics to 18 models across 8 providers on tool-use benchmarks reveals significant within-family behavioral convergence, with some distilled models even surpassing their teachers in AGS. A controlled distillation experiment validates AGS's ability to differentiate teacher-specific convergence from general performance improvements.

Key Contribution

LLM agent distillation leads to surprisingly high rates of behavioral mimicry, with some student models exhibiting tool-use habits *more* similar to their teachers than the teacher's own family members.

Abstract

Model distillation is a primary driver behind the rapid progress of LLM agents, yet it often leads to behavioral homogenization. Many emerging agents share nearly identical reasoning steps and failure modes, suggesting they may be distilled echoes of a few dominant teachers. Existing metrics, however, fail to distinguish mandatory behaviors required for task success from non-mandatory patterns that reflect a model's autonomous preferences. We propose two complementary metrics to isolate non-mandatory behavioral patterns: \textbf{Response Pattern Similarity (RPS)} for verbal alignment and \textbf{Action Graph Similarity (AGS)} for tool-use habits modeled as directed graphs. Evaluating 18 models from 8 providers on $\tau$-Bench and $\tau^2$-Bench against Claude Sonnet 4.5 (thinking), we find that within-family model pairs score 5.9 pp higher in AGS than cross-family pairs, and that Kimi-K2 (thinking) reaches 82.6\% $S_{\text{node}}$ and 94.7\% $S_{\text{dep}}$, exceeding Anthropic's own Opus 4.1. A controlled distillation experiment further confirms that AGS distinguishes teacher-specific convergence from general improvement. RPS and AGS capture distinct behavioral dimensions (Pearson $r$ = 0.491), providing complementary diagnostic signals for behavioral convergence in the agent ecosystem. Our code is available at https://github.com/Syuchin/AgentEcho.

Eval Frameworks & Benchmarks Inference & Quantization Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References74

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

Related Papers