BeihangMar 15, 2026arXiv:2603.14501

CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language

Junhang Cheng, Fang Liu, Jia Li, Chengru Wu, Nanxiang Jiang, Li Zhang

AI Summary

The paper introduces CangjieBench, a new contamination-free benchmark for evaluating LLMs on Cangjie, a low-resource general-purpose programming language. The benchmark consists of 248 manually translated samples from HumanEval and ClassEval, covering both Text-to-Code and Code-to-Code tasks. Experiments across Direct Generation, Syntax-Constrained Generation, RAG, and Agent settings reveal that Syntax-Constrained Generation offers the best trade-off, while Code-to-Code translation surprisingly underperforms Text-to-Code generation, indicating negative transfer.

Key Contribution

LLMs struggle with low-resource general-purpose programming languages, and surprisingly, translating code *to* a low-resource language is harder than generating it from text.

Abstract

Large Language Models excel in high-resource programming languages but struggle with low-resource ones. Existing research related to low-resource programming languages primarily focuses on Domain-Specific Languages (DSLs), leaving general-purpose languages that suffer from data scarcity underexplored. To address this gap, we introduce CangjieBench, a contamination-free benchmark for Cangjie, a representative low-resource general-purpose language. The benchmark comprises 248 high-quality samples manually translated from HumanEval and ClassEval, covering both Text-to-Code and Code-to-Code tasks. We conduct a systematic evaluation of diverse LLMs under four settings: Direct Generation, Syntax-Constrained Generation, Retrieval-Augmented Generation (RAG), and Agent. Experiments reveal that Direct Generation performs poorly, whereas Syntax-Constrained Generation offers the best trade-off between accuracy and computational cost. Agent achieve state-of-the-art accuracy but incur high token consumption. Furthermore, we observe that Code-to-Code translation often underperforms Text-to-Code generation, suggesting a negative transfer phenomenon where models overfit to the source language patterns. We hope that our work will offer valuable insights into LLM generalization to unseen and low-resource programming languages. Our code and data are available at https://github.com/cjhCoder7/CangjieBench.

Code Generation & Program Synthesis Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language

Related Papers