CASKCLPKUMar 17, 2026arXiv:2603.16479

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

Zhihao Gong, Zeyu Sun, Qingyuan Liang, Jie M. Zhang, Dan Hao

AI Summary

The paper introduces TRACE, a benchmark designed to evaluate the execution efficiency of code translated by LLMs across C++, Java, and Python, comprising 1,000 efficiency-critical tasks with stress tests. Evaluating 28 LLMs using TRACE reveals that correctness does not guarantee efficiency, with the top correctness model underperforming smaller models in time efficiency. Analysis of inefficient translations shows that inefficiencies are prevalent and patterned, stemming from algorithmic faults, language construct mismatches, and resource mismanagement.

Key Contribution

Even when LLMs translate code correctly, over 20% of the time it's surprisingly inefficient due to algorithmic flaws, poor language choices, or resource mismanagement.

Abstract

While Large Language Models (LLMs) have substantially improved the functional correctness of code translation, the critical dimension of \textit{execution efficiency} remains overlooked. We present \textbf{\textsc{trace}}, the first benchmark to explicitly assess efficiency in LLM-translated code. \textsc{trace} includes 1,000 efficiency-critical tasks across C++, Java, and Python, each augmented with stress tests that reveal efficiency degradations often overlooked by small-scale tests. Using \textsc{trace}, we conduct an extensive evaluation of 28 representative LLMs and highlight several key insights: 1) Correctness is not a reliable proxy for efficiency: the correctness leader \textit{Claude-4-think} achieves only mid-level time efficiency, outperformed by smaller open-source LLMs such as \textit{Qwen2.5-Coder-14B-Instruct}. 2) Inefficiency is both prevalent and patterned: 23.5\% of correct translations exhibit pronounced inefficiency, distributed across algorithmic faults (11.9\%), language construct mismatches (66.4\%), and resource mismanagement (21.7\%). 3) Inference-time prompt strategies bring only modest improvements, suggesting that current LLMs lack intrinsic efficiency awareness. Together, our results establish efficiency as an essential dimension of code translation and position \textsc{trace} as a principled foundation for efficiency-oriented evaluation.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

Related Papers