Search papers, labs, and topics across Lattice.
The paper introduces ConSelf, a self-improvement method for code generation in LLMs that circumvents the need for external teacher models or test oracles. ConSelf uses code semantic entropy to construct a curriculum of learnable problems and consensus-driven direct preference optimization (Con-DPO) to refine the model based on the agreement between generated code behaviors. Experiments show ConSelf significantly improves performance on code generation benchmarks compared to baselines, demonstrating effective self-improvement.
LLMs can bootstrap their code generation abilities without external supervision by leveraging semantic entropy to identify learnable tasks and behavioral consensus to filter noisy self-generated training signals.
Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated supervision. Experiments on various benchmarks and backbone LLMs demonstrate that ConSelf significantly outperforms baselines, validating the effectiveness of semantic entropy-based curriculum construction and consensus-driven optimization in improving code generation without external supervision.