March 18 – March 25, 2026

Code Generation & Program Synthesis - Weekly Roundup

45 papers published across 5 labs.

70% acceleration

Selected Labs publishing this week

Tsinghua AI2 BAIR1 Microsoft Research1 NVIDIA1 CMU ML1

Top Papers

Mar 18, 2026

2w ago

Requirements volatility in software architecture design: an exploratory case study

Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.

Sanja Aaramaa, Sandun Dasanayake, M. Oivo +4

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis

Houston Haynes +12w ago

The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation

Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.

Houston Haynes, H. Haynes

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Mar 25, 2026

BAIR1w ago·also Microsoft Research, IIT

Composer 2 Technical Report

Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.

Cursor Reseach Aaron Chan, Ahmed Shalaby, Alexander Wettig +51

Code Generation & Program Synthesis RLHF & Preference Learning Tool Use & Agents

Terry Chen +221w ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.

Terry Chen, Zhifan Ye, Bing Xu +20

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Mar 19, 2026

Dimitris Mitropoulos +41w ago·also TU Delft

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

All Papers (45)

Mar 25, 2026

BAIR1w ago·also Microsoft Research, IIT

Composer 2 Technical Report

Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.

Cursor Reseach Aaron Chan, Ahmed Shalaby, Alexander Wettig +51

Code Generation & Program Synthesis RLHF & Preference Learning Tool Use & Agents

Terry Chen +221w ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.

Terry Chen, Zhifan Ye, Bing Xu +20

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Mar 19, 2026

Dimitris Mitropoulos +41w ago·also TU Delft

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

1w ago

TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)

Most sparse tensor compilers are riddled with bugs, silently miscompiling code or crashing on valid inputs, a problem exposed by a new fuzzer that guarantees valid tensor contractions.

Kabilan Mahathevan, Kabilan Mahathevan, Yining Zhang +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

An Luo +161w ago

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

Despite advances in LLMs, human-AI collaboration still significantly outperforms AI-only agents in domain-specific data science tasks, proving that human expertise remains crucial.

An Luo, Jin Du, Xun Xian +14

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

1w ago·also Penn State, Virginia Tech

Implicit Patterns in LLM-Based Binary Analysis

LLMs analyzing binaries aren't just spitting out tokens – they're exhibiting surprisingly structured reasoning patterns like "early pruning" and "targeted backtracking" that could revolutionize how we understand and control these systems.

Qiang Li, XiangRui Zhang, Haining Wang

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Santiago Berrezueta-Guzman +21w ago

Beyond the Code: A Multi-Modal Assessment Strategy for Fostering Professional Competencies via Introductory Programming Projects

Ditch the syntax-only grind: a multi-modal assessment strategy proves that introductory programming courses can boost both coding skills and crucial soft skills like communication and critical thinking.

Santiago Berrezueta-Guzman, Vanesa Metaj, Stefan Wagner

Code Generation & Program Synthesis Multimodal Models Tool Use & Agents

Georgios Alexopoulos +61w ago

Cross-Ecosystem Vulnerability Analysis for Python Applications

Current Python vulnerability scanners miss millions of vulnerable downloads by failing to account for vendored dependencies and OS-level security patches.

Georgios Alexopoulos, Nikolaos Alexopoulos, Thodoris Sotiropoulos +4

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

1w ago

Leveraging Large Language Models for Generalizing Peephole Optimizations

LLMs can automate and significantly improve the generalization of compiler peephole optimizations, outperforming specialized program synthesis techniques.

Chun-Feng Liao, Chunhao Liao, Hongxu Xu +7

Code Generation & Program Synthesis Training Efficiency & Optimization

1w ago

AutORAN: LLM-driven Natural Language Programming for Agile xApp Development

Forget months of manual coding: AutORAN lets you build and deploy O-RAN xApps from natural language in minutes.

Xin Li, Xin Li, Shiming Yu +9

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

1w ago·also INRIA, Université de Rennes

SpaceTime Programming: Live and Omniscient Exploration of Code and Execution

Imagine a debugger that not only shows you the past, but also lets you explore alternative code paths and their execution, all in real-time.

Jean-Baptiste Döderlein, Jean-Baptiste Doderlein, D. Khelladi +4

Code Generation & Program Synthesis

Lingming Zhang +71w ago

Weaver: Fuzzing JavaScript Engines at the JavaScript-WebAssembly Boundary

The complex JS-Wasm boundary is fertile ground for new vulnerabilities, and Weaver is the first fuzzer to effectively till it.

Lingming Zhang, Binbin Zhao, Puzhuo Liu +5

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

1w ago

SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization

Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.

Lei Yu, Jingyuan Zhang, Xin Wang +5

Code Generation & Program Synthesis Natural Language Processing RLHF & Preference Learning

Emmanuel Pilliat1w ago

High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia

Julia can now hang with the big dogs: KernelForge.jl proves that portable, JIT-compiled GPU primitives can achieve vendor-level performance (matching or exceeding CUB and cuBLAS) without sacrificing generality.

Emmanuel Pilliat

Code Generation & Program Synthesis Distributed Systems & Hardware Training Efficiency & Optimization

Project Lead1w ago

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Agentic AI systems are still far from maximizing hardware potential: SOL-ExecBench reveals a significant gap between current GPU kernel performance and analytically derived Speed-of-Light bounds across a wide range of AI models.

Edward Lin, Sahil Modi, S. Hari +37

Code Generation & Program Synthesis Distributed Systems & Hardware Eval Frameworks & Benchmarks

NVIDIA1w ago·also HKUST, Waterloo

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

A 30B MoE model can now achieve Gold Medal-level performance in IMO, IOI, and ICPC, rivaling frontier models with 20x more parameters.

Zhuoling Yang, Zhuolin Yang, Zihan Liu +29

Code Generation & Program Synthesis Reasoning & Chain-of-Thought RLHF & Preference Learning

Xiucheng Wang +21w ago

BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding

LLMs can orchestrate complex wireless communication optimization tasks by translating natural language intent into actionable spatial constraints, enabling gradient-based solvers to outperform traditional methods without requiring domain-specific fine-tuning.

Xiucheng Wang, Yue Zhang, Nan Cheng

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

L. Arkoh +21w ago

Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection

ChatGPT-4o-mini can spot design discussions in code repositories better than other models, offering a new path to automatically surfacing valuable context for software engineers.

L. Arkoh, Daniel Feitosa, Wesley K. G. Assunccao

Code Generation & Program Synthesis Natural Language Processing

Mar 18, 2026

2w ago

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.

Jason Shin, Jay W. Shin, Jiwon Chang +1

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval

2w ago·also Northwestern

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.

Chi Zhang, Xiang Feng, Qiming Zhang +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Bassam Adnan +32w ago

ArchBench: Benchmarking Generative-AI for Software Architecture Tasks

Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.

Bassam Adnan, Aviral Gupta, Sreemaee Akshathala +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

Requirements volatility in software architecture design: an exploratory case study

Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.

Sanja Aaramaa, Sandun Dasanayake, M. Oivo +4

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis

Sergey V. Samsonau2w ago

scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

LLMs can now automatically generate bug-detection patterns for scientific code, offering a scalable solution to the growing problem of methodology errors in AI-driven research.

Sergey V. Samsonau

Code Generation & Program Synthesis Scientific Discovery & Drug Design

Jie Lei +42w ago

Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings

Achieve up to 2.4x speedup over OpenBLAS on RISC-V by using MLIR and xDSL to generate optimized RVV code, finally unlocking the potential of RISC-V vector extensions.

Jie Lei, H. Mart'inez, Héctor Martínez +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Universidad ORT Uruguay2w ago

TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

Simply prompting for test-driven development can *increase* regressions in AI coding agents; instead, focus on surfacing contextual information about which tests are most relevant.

Pepe Alonso

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2w ago

HWE-Bench: Can Language Models Perform Board-level Schematic Designs?

LLMs can read datasheets, but still can't design circuits, failing at basic physical intuition despite showing promise in documentation understanding.

Weibo Qiu, Yinhao Xiao, Runyu Pan

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

A New Approach to Code Smoothing Bounds

Random walks and equitable partitions offer a fresh perspective on bounding the smoothing parameter in code-based cryptography, potentially surpassing Fourier transform-based methods.

Tsuyoshi Miezaki, Yusaku Nishimura, K. Takashima

Code Generation & Program Synthesis

Amine Lbath +12w ago

Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection

Automated injection of realistic vulnerabilities and synthesis of PoV exploits finally makes scalable, precisely labeled, repository-level vulnerability datasets a reality.

Amine Lbath, Amine Lbath

Code Generation & Program Synthesis Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Sriram Gopalakrishnan2w ago

Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows

Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.

Sriram Gopalakrishnan

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Houston Haynes +12w ago

The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation

Houston Haynes, H. Haynes

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Hadil Ben Amor +42w ago

MLmisFinder: A Specification and Detection Approach of Machine Learning Service Misuses

Despite the ease of integrating ML cloud services, developers are widely misusing them, leading to quality and maintainability issues that MLmisFinder can now automatically detect with high accuracy.

Hadil Ben Amor, Niruthiha Selvanayagam, Manel Abdellatif +2

Code Generation & Program Synthesis Tool Use & Agents

2w ago

Bootstrapping Coding Agents: The Specification Is the Program

Forget about chasing the perfect model architecture – this work suggests the real key to better AI agents lies in crafting more precise and complete specifications, since the implementation can always be re-generated.

Martin Monperrus, M. Monperrus

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Zichen Xie +12w ago

Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought

LLMs can't reason their way through Rust verification, struggling to complete proofs even with substantial hints, revealing a critical gap in their ability to handle the rigorous demands of secure software development.

Zichen Xie, Wenxi Wang

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Philipp Normann +42w ago

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.

Philipp Normann, A. Happe, Andreas Happe +2

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness+1

W. Xiao +92w ago

GUIDE: GenAI Units In Digital Design Education

Standardized, modular GenAI teaching units in GUIDE offer a practical path to integrating cutting-edge AI tools into digital design education.

W. Xiao, Weihua Xiao, Jason Blocklove +7

Code Generation & Program Synthesis Open-Source Models & Weights

Ruhr University Bochum2w ago·also GMV Spain, NEC Laboratories Europe

On Securing the Software Development Lifecycle in IoT RISC-V Trusted Execution Environments

Secure enclave updates and migrations, previously missing from RISC-V TEEs, are now practical thanks to a novel toolkit that adds minimal overhead.

Annika Wilde, Samira Briongos, Claudio Soriente +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Tsinghua AI2w ago

CodeT5-RNN: Reinforcing Contextual Embeddings for Enhanced Code Comprehension

LLMs struggle with code comprehension, but a simple RNN pass over their embeddings can boost accuracy by over 5%.

Md Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe +3

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Dalhousie University2w ago

CodeGreen: Towards Improving Precision and Portability in Software Energy Measurement

Finally, a software energy profiler achieves both high accuracy and cross-platform portability, enabling practical algorithmic energy optimization across diverse languages and hardware.

Saurabhsingh Rajput, Tushar Sharma

Code Generation & Program Synthesis Distributed Systems & Hardware Training Efficiency & Optimization

2w ago·also PolyU

AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse

Forget prompt engineering: AgentFactory lets LLM agents self-evolve by accumulating and refining executable Python subagents, making task re-execution more reliable and efficient.

Zhang Zhang, Shuqi Lu, Hongjin Qian +2

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

2w ago·also CUHK, SenseTime, UT Dallas

FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair

Turning past programming failures into reusable knowledge boosts automated repair performance by 3.7% on a multimodal benchmark.

Ruize Ma, Shilin Zhang, Zheng Ma +7

Code Generation & Program Synthesis Multimodal Models

2w ago·also Monash, Sydney

Revisiting Vulnerability Patch Identification on Data in the Wild

Security patch detectors trained on standard vulnerability databases are practically useless in the real world, losing up to 90% F1-score when deployed on in-the-wild data.

I. Irsan, Ratnadira Widyasari, Ting Zhang +7

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)2w ago·also LLNL

Automated Grammar-based Algebraic Multigrid Design With Evolutionary Algorithms

Genetic programming can discover unconventional multigrid cycles that outperform hand-tuned methods, suggesting automated algorithm design can unlock untapped performance in classical numerical solvers.

Dinesh Parthasarathy, Wayne Mitchell, Arjun Gambhir +2

Code Generation & Program Synthesis Scientific Discovery & Drug Design Training Efficiency & Optimization

2w ago

Federated Computing as Code (FCaC): Sovereignty-aware Systems by Design

Federated Computing as Code lets you enforce data sovereignty in federated systems with cryptographic guarantees, moving beyond runtime policies and trust assumptions.

Enzo Fenoglio, Enzo Fenoglio, P. Treleaven +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

CMU ML2w ago·also INSA Rennes

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.

Lintang Sutawika, Aditya Bharat Soni, R. BharathSriraamR +11

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Tsinghua AI2w ago

VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation

LLMs can now generate Verilog code that's not just correct, but also optimized for real-world hardware constraints like power, performance, and area, thanks to a novel multi-agent system with evolving memory.

Yaoxiang Wang, Qiaolin Shi, Qi Shi +8

Code Generation & Program Synthesis Tool Use & Agents