Mar 29, 2026arXiv:2603.27703

KAT-Coder-V2 Technical Report

Fengxian Li, Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Peng Xu, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zong-Xian Feng, Can Tang, Chao Wang, Cheng Tong, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jing Chang, Jingwen Chang, Jun Du, Junyi Peng, Leizhen Cui, Lei Cui, Meimei Jing, Mei-Ling Jing, Mingqi Wu, Shangpeng Yan, Shaotong Qi, Suzhe Xu, Wenxuan Zhao, Xianda Sun, Xian Sun, Xuan Xie, Yanbo Wang, Yao Xia, Yaohua Xia, Yinghan Cui, Yingpeng Chen, Yong Wang, Yuze Shi, Zhiwei Shen, Ziyu Wang, Mingjie Sun, Ming Sun, L. Ye, Lin Ye, Bin Chen

AI Summary

KAT-Coder-V2, an agentic coding model, is trained using a "Specialize-then-Unify" paradigm, where five expert domains (SWE, WebCoding, Terminal, WebSearch, and General) are independently fine-tuned and then consolidated via on-policy distillation. To facilitate this, the authors develop KwaiEnv, a scalable infrastructure for RL training, and introduce MCLA for stabilizing MoE RL training and Tree Training for efficiency. KAT-Coder-V2 achieves state-of-the-art results on SWE-bench Verified (79.6%) and PinchBench (88.7), demonstrating strong performance in both specialized and general coding tasks.

Key Contribution

Agentic coding models can achieve near-SOTA performance by specializing in distinct coding domains before unifying them via on-policy distillation.

Abstract

We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a"Specialize-then-Unify"paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforcement learning, before being consolidated into a single model via on-policy distillation. We develop KwaiEnv, a modular infrastructure sustaining tens of thousands of concurrent sandbox instances, and scale RL training along task complexity, intent alignment, and scaffold generalization. We further propose MCLA for stabilizing MoE RL training and Tree Training for eliminating redundant computation over tree-structured trajectories with up to 6.2x speedup. KAT-Coder-V2 achieves 79.6% on SWE-bench Verified (vs. Claude Opus 4.6 at 80.8%), 88.7 on PinchBench (surpassing GLM-5 and MiniMax M2.7), ranks first across all three frontend aesthetics scenarios, and maintains strong generalist scores on Terminal-Bench Hard (46.8) and tau^2-Bench (93.9). Our model is publicly available at https://streamlake.com/product/kat-coder.

Code Generation & Program Synthesis RLHF & Preference Learning Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...