May 1 – May 8, 2026

Code Generation & Program Synthesis - Weekly Roundup

69 papers published across 2 labs.

Selected Labs publishing this week

Top Papers

May 5, 2026

Alan L. McCann2w ago

Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems

Current package managers are surprisingly vulnerable: a single misconfiguration can silently allow attackers to inject malicious dependencies, a problem solved by this paper's cryptographically enforced provenance system.

Alan L. McCann5

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness

May 6, 2026

NUS2w ago·also SJTU

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

2w ago

Agentic Vulnerability Reasoning on Windows COM Binaries

An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.

Hwiwon Lee, Jongseong Kim, Lingming Zhang

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness Tool Use & Agents

Han Wang +52w ago·also Tsinghua AI

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

LLMs can generate GPU kernels, but they're surprisingly bad at it: 72% of fusion tasks fail across all methods, and nearly half of the "correct" kernels are actually slower than PyTorch.

Han Wang, Jintao Zhang, Kai Jiang +3

Code Generation & Program Synthesis Distributed Systems & Hardware Eval Frameworks & Benchmarks

University of Würzburg2w ago·also Computer Vision Lab

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.

Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

All Papers (69)

May 6, 2026

NUS2w ago·also SJTU

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

2w ago

Agentic Vulnerability Reasoning on Windows COM Binaries

An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.

Hwiwon Lee, Jongseong Kim, Lingming Zhang

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness Tool Use & Agents

Han Wang +52w ago·also Tsinghua AI

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

LLMs can generate GPU kernels, but they're surprisingly bad at it: 72% of fusion tasks fail across all methods, and nearly half of the "correct" kernels are actually slower than PyTorch.

Han Wang, Jintao Zhang, Kai Jiang +3

Code Generation & Program Synthesis Distributed Systems & Hardware Eval Frameworks & Benchmarks

University of Würzburg2w ago·also Computer Vision Lab

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.

Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Jaewook Kim +12w ago

Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

Attention-based models for programming knowledge tracing might not be as effective as previously thought; careful experimental design reveals that their gains over simpler models are often overstated.

Jaewook Kim, Hyeoncheol Kim

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

The Verkor Team +32w ago

Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

LLM agents can now autonomously design complex hardware like an LLM inference accelerator with hard-wired TurboQuant support in just 80 hours.

The Verkor Team, Ravi Krishna, Suresh Krishna +1

Code Generation & Program Synthesis Inference & Quantization Tool Use & Agents

Sergey Rodionov2w ago

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

Verifier-driven executable world models can solve complex reasoning tasks like ARC-AGI-3 without game-specific code, hinting at a path towards more generalizable AI agents.

Sergey Rodionov

Code Generation & Program Synthesis Tool Use & Agents World Models & Planning

2w ago

Architectural Constraints Alignment in AI-assisted, Platform-based Service Development

Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.

Julius Irion, Moritz Leugers, Paul Hartwig +5

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

2w ago

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

LLM-guided code evolution, when combined with runtime feedback and MCTS, can reliably achieve 15x speedups on real-world Java code, surpassing naive LLM-based optimization.

Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora +9

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2w ago

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

Agent-repair leaderboards are more fragile than we thought: methods that peek at the evaluator's signals to guide internal repair choices can cause drastic reordering when the evaluator changes.

Yuelin Hu, Zhenbo Yu, Zhengxue Cheng +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Zeng Ren +32w ago

Library learning with e-graphs on jazz harmony

E-graphs can help AI learn the unwritten rules of jazz harmony, mirroring how human musicians internalize complex musical patterns.

Zeng Ren, Maddy Bowers, Xinyi Guan +1

Code Generation & Program Synthesis Speech & Audio

Siqiao Xue +62w ago

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Developer-style keyword searches completely nullify the advantage of even the best code embedding models, highlighting a critical gap in current code search techniques.

Siqiao Xue, Zihan Liao, Jin Qin +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Recommendation & Information Retrieval

2w ago

Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

AI coding assistants' Terms of Service overwhelmingly place responsibility for code correctness, safety, and legal compliance on the user, creating a potential accountability gap as these tools become more autonomous.

Christoph Treude

Code Generation & Program Synthesis Constitutional AI & AI Ethics Tool Use & Agents

Yaxun Dai +82w ago

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.

Yaxun Dai, Baolin Sun, Junying Wang +6

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Tobias Denzinger +22w ago

Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework

Mixing tasks with different safety levels in automotive ECUs can compromise critical functions, highlighting the need for careful task allocation strategies.

Tobias Denzinger, Matthias Becker, Peter Ulbrich

Code Generation & Program Synthesis Robotics & Embodied AI

2w ago·also Queen's

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

A clever routing strategy lets a tiny 3B code model outperform a massive 480B model on routine code completion tasks, slashing accelerator usage by 58%.

Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Inference & Quantization

2w ago·also Brandenburg University of Technology

Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version

Pinpointing minimal "conflict essences" reveals precisely how graph transformation rules interfere, even with complex nested conditions.

Alexander Lauer, Jens Kosiol, Leen Lambers +1

Code Generation & Program Synthesis Natural Language Processing

Johannes Hartel2w ago

Agentic Repository Mining: A Multi-Task Evaluation

LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.

Johannes Hartel

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2w ago

Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions

Developers overwhelmingly trust and directly apply LLM-generated code refactoring suggestions, but when they don't, the changes are surprisingly drastic and predictable.

David Schon, Faiza Amjad, Tehreem Asif +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

The Open University2w ago

Toward an Understanding of Developer Behaviour while Using Bug Localization Tools

Bug localization tool adoption hinges on more than just accuracy: developers need tools that mesh with their workflows and leverage contextual information.

Pablo Diaz Pedreira, Tamara Lopez, Michel Wermelinger

Code Generation & Program Synthesis Tool Use & Agents

2w ago·also Munich Center for Machine Learning (MCML), These authors contributed equally to

A meta-analysis of the effect of generative AI on productivity and learning in programming

GenAI coding assistants boost developer productivity, but the gains shrink outside the lab and don't translate to better learning.

Sebastian Maier, Moritz Gunzenhauser, J. Schweisthal +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

2w ago

How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study

Turns out, chunking code by function is the *worst* way to do retrieval-augmented code completion.

Xinjian Wu, Jingzhi Gong, Gunel Jahangirova +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Junhao Ye +92w ago

UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification

Automating UVM testbench generation with LLMs slashes verification time from days to hours, achieving near-complete code coverage.

Junhao Ye, Dingrong Pan, Hanyuan Liu +7

Code Generation & Program Synthesis Tool Use & Agents

BaseThesis Labs2w ago·also QwikBuild

SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

"Vibe coding" platforms promise effortless app creation, but SWE-WebDevBench reveals they often deliver visually appealing frontends with broken backends, struggle with security, and require significant human effort to reach production readiness.

Siddhant Saxena, Nilesh Trivedi, V. Jyothi

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Barkhausen Institut2w ago

Interaction Tree Semantics for RISC-V: Bridging Compiler and Hardware Verification

Proving semantic equivalence between LLVM IR and RISC-V code is now possible within a single framework, thanks to a new formal RISC-V semantics built on Interaction Trees.

Shuanglong Kan, Sebastian Ertel

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Serra Z. Dane +32w ago·also UMich, ZJU

Towards Formal Verification of Hybrid Synchronous Programs with Refinement Types

Guaranteeing safety in autonomous systems gets a boost: this work enables formal verification of hybrid system code that directly controls physical processes.

Serra Z. Dane, Jiawei Chen, Marc Pouzet +1

Code Generation & Program Synthesis Robotics & Embodied AI

Honghu Pan +42w ago

Computer-Aided Design Generation by Cascaded Discrete Diffusion Model

Discrete diffusion, with carefully designed transition matrices for commands and parameters, unlocks superior CAD generation compared to continuous diffusion baselines.

Honghu Pan, Xiaoling Luo, Yongyong Chen +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Computer Vision

Joshua H. Davis +72w ago

KEET: Explaining Performance of GPU Kernels Using LLM Agents

Stop squinting at Nsight Compute profiles: KEET uses LLMs to automatically diagnose GPU kernel bottlenecks and suggest optimizations in plain English.

Joshua H. Davis, Klaudiusz Rydzy, S. Ramesh +5

Code Generation & Program Synthesis Interpretability & Mechanistic Interp Tool Use & Agents

May 5, 2026

Elitsa Yotkova +42w ago

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

Forget the heavy transformers: surprisingly effective LLM-generated code detection can be achieved with lightweight stylometric features and decision trees, offering near-instant inference.

Elitsa Yotkova, Violeta Kastreva, D. Dimitrov +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Le Zhou +42w ago

Rose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQL

Rose-SQL achieves state-of-the-art multi-turn Text-to-SQL performance with small models, outperforming larger fine-tuned models without any training.

Le Zhou, Feng Yao, Fengcai Qiao +2

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

Elijah Zolduoarrati +22w ago

Geographic Variation in Stack Overflow Code Quality: Evidence from a Cross-Regional Study of Coding Practices

Stack Overflow code quality varies significantly across US states, with major tech hubs surprisingly not producing the highest quality code.

Elijah Zolduoarrati, Sherlock A. Licorish, Nigel Stanger

Code Generation & Program Synthesis Data Curation & Synthetic Data

Chun Yin Chiu2w ago

Lightweight Vulnerability Detection from Code Metrics and Token Features

Forget the heavyweight deep learning approaches – surprisingly effective vulnerability detection can be achieved with simple TF-IDF token features and basic code metrics, offering a fast and transparent baseline for human triage.

Chun Yin Chiu

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

2w ago·also Helmholtz

Root-Cause-Driven Automated Vulnerability Repair

LLM-based vulnerability repair can be significantly improved by focusing on root cause analysis, leading to more robust and less superficial patches than current methods.

Hulin Wang, Zion Leonahenahe Basque, Jie Hu +13

Code Generation & Program Synthesis Reasoning & Chain-of-Thought

Chun Yin Chiu2w ago

Towards a Zero-Trust Supply-Chain Assurance Rubric for ORAN RIC Applications

Securely onboarding third-party apps in Open RAN just got easier: a new zero-trust rubric offers explicit Accept/Escalate/Block decisions.

Chun Yin Chiu

Code Generation & Program Synthesis Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

2w ago

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

LLMs can now automatically generate effective proof-of-vulnerability tests for complex software, uncovering real-world attack vectors with minimal human intervention.

Shravya Kanchi, Xiaoyan Zang, Ying Zhang +2

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

J. Steinberg +12w ago

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Innocuous-looking coding tasks, when chained together, trick even the best coding agents into creating exploitable code with alarming frequency.

J. Steinberg, Oren Gal

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

2w ago·also Independent

KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

LLMs struggle to formally verify real-world code, but KVerus's self-adaptive approach closes the gap, enabling verification of complex, evolving Rust systems with significantly improved success rates.

Yuwei Liu, Xinyi Wan, Yanhao Wang +3

Code Generation & Program Synthesis Reasoning & Chain-of-Thought

Universidad Carlos III de Madrid2w ago

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

LLMs can cheaply generate malware variants that are structurally diverse yet functionally identical, posing a significant challenge to signature-based detection methods.

Gabriel Hortea, Juan Tapiador

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

Xing Zhang +32w ago

Tailored Prompts, Targeted Protection: Vulnerability-Specific LLM Analysis for Smart Contracts

LLMs can achieve surprisingly high precision in smart contract vulnerability detection, but only with vulnerability-specific prompts and AST-based context.

Xing Zhang, Ke Zhang, Taohong Zhu +1

Code Generation & Program Synthesis Data Curation & Synthetic Data Natural Language Processing

Telecom Paris and Ledger Donjon2w ago·also Ledger Donjon, Telecom Paris, University of the Western Cape

From TinyGo to gc Compiler: Extending Zorya's Concolic Framework to Real-World Go Binaries

Zorya can now automatically find previously undetected vulnerabilities in compiled Go binaries, even silent integer overflows that other tools miss.

Karolina Gorna, Nicolas Iooss, Yannick Seurin +2

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

Alan L. McCann2w ago

Cryptographic Registry Provenance: Structural Defense Against Dependency Confusion in AI Package Ecosystems

Alan L. McCann5

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness

C. Soares +52w ago

AI Advocate: Educational Path to Transform Squads to the Future

Upskilling internal "AI Advocates" can be a surprisingly effective catalyst for driving cultural and technical transformation in software development squads.

C. Soares, G. Moreira, Ana Paula Camargo +3

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

2w ago

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

LLM agent skills are needlessly brittle and insecure: SkCC compiles them into a portable, hardened format that boosts performance by 50% and proactively blocks attacks.

Yipeng Ouyang, Yingjiao Xiao, Yuhao Gu +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

S. Vigraham2w ago

When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

Sometimes, giving an agent more information actually *hurts* its ability to solve a problem, especially when its default behavior is already pretty good.

S. Vigraham

Code Generation & Program Synthesis Tool Use & Agents

Toufique Ahmed +32w ago

Reproduction Test Generation for Java SWE Issues

Java developers drowning in unfixed bugs, rejoice: automated reproduction test generation is now a viable option, thanks to a new benchmark and adapted generator.

Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

Exploring the Output of Software Testing Tools through a Visual Comparative Analysis

Software testing tools share surprisingly consistent visual patterns, offering a blueprint for designing more intuitive and informative testing interfaces.

Brandon Lit, Anthony Maocheia-Ricci, Thomas Driscoll

Code Generation & Program Synthesis Tool Use & Agents

Pranav Srinivasan +22w ago

t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)

Formal reasoning about programmable memory hierarchies is now possible, thanks to a new ISA-level memory consistency model that tames the complexity of architectures like t\"{a}k\={o}.

Pranav Srinivasan, Manos Kapritsos, Yatin A. Manerkar

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

2w ago

Cosmodoit: A Python Package for Adaptive, Efficient Pipelining of Feature Extraction from Performed Music

Stop wrestling with disparate tools and languages for music performance analysis: Cosmodoit offers a unified Python pipeline for efficient, large-scale feature extraction.

C. Guichaoua, D. Bedoya, Elaine Chew

Code Generation & Program Synthesis Speech & Audio

Danny Hoang +72w ago

Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing

LLMs can't reliably orchestrate multi-step manufacturing workflows, but this physics-grounded multi-agent system can, boosting tool execution success by 87.5% while ensuring traceable, risk-aware decisions.

Danny Hoang, Ryan Matthiessen, Chris Miller +5

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Negar Arabzadeh +32w ago

RAG over Thinking Traces Can Improve Reasoning Tasks

RAG's reputation for being ineffective in reasoning tasks is shattered by showing that retrieving the right data – intermediate "thinking traces" – unlocks substantial performance gains, even for state-of-the-art models.

Negar Arabzadeh, Wenjie Ma, Sewon Min +1

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Recommendation & Information Retrieval

University of São Paulo2w ago·also University of Brasília

Operationalizing Software Engineering Theories for Practical Validation

Grounding software engineering theories in empirical evidence just got easier: this paper offers a systematic, replicable procedure for translating abstract concepts into testable hypotheses.

Isaque Alves, Fabio Kon, Jessica Díaz +1

Code Generation & Program Synthesis Tool Use & Agents

University of Tennessee2w ago·also ORNL

Exploring Sustainability in Scientific Software through Code Quality&Test Coverage Metrics

Sustainable scientific software isn't just about the code; it's about consistent testing and clear links between code quality and tests, a pattern often missing in unsustainable projects.

Sheikh Md. Mushfiqur Rahman, Gregory R. Watson, Nasir U. Eisty

Code Generation & Program Synthesis Open-Source Models & Weights Scientific Discovery & Drug Design

Zoner Oy2w ago·also Helsinki

Multi-Agent Systems for Root Cause Analysis in Microservices

LLMs can now collaboratively pinpoint root causes in microservices using a tree-structured search, but production environments reveal the limitations of this approach when faced with polyglot stacks and inconsistent logging.

Alexander Naakka, Yuqing Wang, M. Mantyla

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Gehao Zhang +12w ago

POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

LLMs can generate formally correct postconditions for code, but they often miss crucial details, especially in complex, real-world scenarios.

Gehao Zhang, Juan Zhai

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

Deep Graph-Language Fusion for Structure-Aware Code Generation

Injecting graph representations of code directly into LLM internals unlocks a 16% BLEU boost in code generation, suggesting that structural awareness is key to next-gen code models.

Mert Tiftikci, Amir Molzam Sharifloo, Mira Mezini

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Christian Mancas +12w ago

Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

LLMs can partially reverse engineer legacy code, but don't expect them to fully understand your spaghetti code just yet.

Christian Mancas, Diana Christina Mancas

Code Generation & Program Synthesis Natural Language Processing

UW2w ago

ProgramBench: Can Language Models Rebuild Programs From Scratch?

LLMs can't rebuild software from scratch, even for widely used programs like FFmpeg and SQLite, revealing a critical gap in their ability to make high-level software architecture decisions.

John Yang, K. Lieret, J. Ma +9

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Goran Mitevski2w ago

Long-Range Correlation in Code Commit Dynamicsas a Novel Indicator of Software Product Stability: A Detrended Fluctuation Analysis Study

Software stability isn't about how much code you commit, but how far ahead you're thinking: fractal analysis reveals long-range planning in commit patterns predicts stability better than commit volume alone.

Goran Mitevski

Code Generation & Program Synthesis

P. Cassieri +42w ago

Beyond Rules: LLM-Powered Linting for Quantum Programs

LLMs can catch more bugs in quantum code than traditional rule-based linters, suggesting a new path to more reliable quantum software.

P. Cassieri, Giuseppe Scanniello, Seung Yeob Shin +2

Code Generation & Program Synthesis Natural Language Processing

Maryse Ernzer +32w ago

Randomized and Diverse Input State Generation for Quantum Program Testing

A new Brick-Circuit generator achieves higher expressibility and entanglement in quantum program testing, outperforming existing methods with shallower circuits.

Maryse Ernzer, Seung Yeob Shin, Fabrizio Pastore +1

Code Generation & Program Synthesis

IIT2w ago·also Poly Montreal

Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

Rust developers can slash the noise in static analysis alerts by over 50% using an RL agent that learns to suppress false positives, outperforming even LLM-based methods.

P. Akilesh, L. D. Silva, F. Khomh +1

Code Generation & Program Synthesis Training Efficiency & Optimization

Ahmed F. Ibrahim2w ago

A Multi-Agent Consensus Protocol for Stable Software Remodularization

Guaranteeing software stability during remodularization doesn't require sacrificing performance; a multi-agent consensus protocol can match state-of-the-art optimizers while acting as a "circuit breaker" for strict stability constraints.

Ahmed F. Ibrahim

Code Generation & Program Synthesis Distributed Systems & Hardware Tool Use & Agents

Brian Houck +32w ago

EngThrive: Make It Fast and Easy to Do Great Work

Microsoft's EngThrive framework reveals how aligning developer productivity metrics with genuine improvement can drive sustained, system-level gains in Speed, Ease, and Quality.

Brian Houck, Tim Bozarth, David Liu +1

Code Generation & Program Synthesis

May 4, 2026

2w ago

AcademiClaw: When Students Set Challenges for AI Agents

Today's best AI agents can only solve 55% of real-world academic tasks that university students find challenging, revealing a significant gap between current AI capabilities and the demands of academic workflows.

Junjie Yu, Pengrui Lu, Weiye Si +75

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

S. Catalán +22w ago

Leveraging Teaching on Demand: Approaching HPC to Undergrads

Hands-on experience with Raspberry Pi clusters and student-driven learning can effectively bridge the HPC skills gap in undergraduate engineering education.

S. Catalán, R. Carratalá-Sáez, S. Iserte

Code Generation & Program Synthesis Distributed Systems & Hardware

May 2, 2026

Siqi Zhu2w ago

Agentic AI Systems Should Be Designed as Marginal Token Allocators

Treating agentic AI systems as token economies reveals that current designs, which optimize token usage locally, lead to predictable global misallocations and inefficiencies.

Siqi Zhu

Code Generation & Program Synthesis Tool Use & Agents

May 1, 2026

Daniel Song +233w ago

Code World Model Preparedness Report

Meta's risk assessment of its Code World Model (CWM) gives it a clean bill of health, concluding it poses no *new* catastrophic risks beyond those already present in the AI landscape.

Daniel Song, Peter Ney, Cristina Menghini +21

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Indraneil Paul +33w ago

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Current code reward models are myopic, mostly rewarding functional correctness, but Themis-RM learns to score code across multiple criteria and languages, opening the door to more nuanced and useful code generation.

Indraneil Paul, Glavaš Glavas, Glavavs Glavas +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks RLHF & Preference Learning

Massimo Rondelli +23w ago

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis

LLMs can now generate 70% syntactically correct and geometrically consistent 3D objects from text, thanks to retrieval-augmented code synthesis.

Massimo Rondelli, Francesco Pivi, Maurizio Gabbrielli

Code Generation & Program Synthesis Multimodal Models Recommendation & Information Retrieval