MetaXNJUUSTCMar 17, 2026arXiv:2603.16790

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Wei-Quan Gu, Yaxin Du, Joseph Li, Fang-jiang Xu, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, T. Zheng, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang, Chenghua Lin, Chenghu Lin, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Mingfa Zhou, Bryan Dai, Weifeng Lv

AI Summary

InCoder-32B, a 32B-parameter code foundation model, was developed to address the performance limitations of existing code LLMs in industrial scenarios involving hardware semantics, specialized languages, and resource constraints. The model was trained from scratch using a multi-stage approach: general code pre-training, industrial code annealing, context extension to 128K tokens with synthetic data, and execution-grounded verification. Evaluation across 14 general and 9 industrial benchmarks demonstrates InCoder-32B's competitive performance on general tasks and strong open-source baselines in specialized industrial domains.

Key Contribution

A new 32B code LLM trained specifically for industrial tasks crushes existing models on specialized domains like chip design and GPU kernel optimization, while remaining competitive on general coding benchmarks.

Abstract

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

Code Generation & Program Synthesis Distributed Systems & Hardware Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

InCoder-32B: Code Foundation Model for Industrial Scenarios

Related Papers