March 11 – March 18, 2026

Code Generation & Program Synthesis - Weekly Roundup

100 papers published across 4 labs.

70% acceleration

Selected Labs publishing this week

Tsinghua AI4 CMU ML1 Microsoft Research1 Meta AI1

Top Papers

Mar 18, 2026

2w ago

Requirements volatility in software architecture design: an exploratory case study

Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.

Sanja Aaramaa, Sandun Dasanayake, M. Oivo +4

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis

Houston Haynes +12w ago

The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation

Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.

Houston Haynes, H. Haynes

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

2w ago

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.

Jay W. Shin, Jason Shin, Jiwon Chang +1

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval

2w ago·also Northwestern

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.

Chi Zhang, Xiang Feng, Qiming Zhang +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Bassam Adnan +32w ago

ArchBench: Benchmarking Generative-AI for Software Architecture Tasks

Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.

Bassam Adnan, Aviral Gupta, Sreemaee Akshathala +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Eval Frameworks & Benchmarks

All Papers (100)

Mar 18, 2026

2w ago

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.

Jay W. Shin, Jason Shin, Jiwon Chang +1

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval

2w ago·also Northwestern

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.

Chi Zhang, Xiang Feng, Qiming Zhang +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Bassam Adnan +32w ago

ArchBench: Benchmarking Generative-AI for Software Architecture Tasks

Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.

Bassam Adnan, Aviral Gupta, Sreemaee Akshathala +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

Requirements volatility in software architecture design: an exploratory case study

Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.

Sanja Aaramaa, Sandun Dasanayake, M. Oivo +4

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis

Sergey V. Samsonau2w ago

scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

LLMs can now automatically generate bug-detection patterns for scientific code, offering a scalable solution to the growing problem of methodology errors in AI-driven research.

Sergey V. Samsonau

Code Generation & Program Synthesis Scientific Discovery & Drug Design

Jie Lei +42w ago

Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings

Achieve up to 2.4x speedup over OpenBLAS on RISC-V by using MLIR and xDSL to generate optimized RVV code, finally unlocking the potential of RISC-V vector extensions.

Jie Lei, H. Mart'inez, Héctor Martínez +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Universidad ORT Uruguay2w ago

TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

Simply prompting for test-driven development can *increase* regressions in AI coding agents; instead, focus on surfacing contextual information about which tests are most relevant.

Pepe Alonso

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2w ago

HWE-Bench: Can Language Models Perform Board-level Schematic Designs?

LLMs can read datasheets, but still can't design circuits, failing at basic physical intuition despite showing promise in documentation understanding.

Weibo Qiu, Yinhao Xiao, Runyu Pan

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

2w ago

A New Approach to Code Smoothing Bounds

Random walks and equitable partitions offer a fresh perspective on bounding the smoothing parameter in code-based cryptography, potentially surpassing Fourier transform-based methods.

Tsuyoshi Miezaki, Yusaku Nishimura, K. Takashima

Code Generation & Program Synthesis

Amine Lbath +12w ago

Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection

Automated injection of realistic vulnerabilities and synthesis of PoV exploits finally makes scalable, precisely labeled, repository-level vulnerability datasets a reality.

Amine Lbath, Amine Lbath

Code Generation & Program Synthesis Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Sriram Gopalakrishnan2w ago

Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows

Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.

Sriram Gopalakrishnan

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Houston Haynes +12w ago

The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation

Houston Haynes, H. Haynes

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Hadil Ben Amor +42w ago

MLmisFinder: A Specification and Detection Approach of Machine Learning Service Misuses

Despite the ease of integrating ML cloud services, developers are widely misusing them, leading to quality and maintainability issues that MLmisFinder can now automatically detect with high accuracy.

Hadil Ben Amor, Niruthiha Selvanayagam, Manel Abdellatif +2

Code Generation & Program Synthesis Tool Use & Agents

2w ago

Bootstrapping Coding Agents: The Specification Is the Program

Forget about chasing the perfect model architecture – this work suggests the real key to better AI agents lies in crafting more precise and complete specifications, since the implementation can always be re-generated.

Martin Monperrus, M. Monperrus

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Zichen Xie +12w ago

Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought

LLMs can't reason their way through Rust verification, struggling to complete proofs even with substantial hints, revealing a critical gap in their ability to handle the rigorous demands of secure software development.

Zichen Xie, Wenxi Wang

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Philipp Normann +42w ago

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.

Philipp Normann, A. Happe, Andreas Happe +2

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness+1

W. Xiao +92w ago

GUIDE: GenAI Units In Digital Design Education

Standardized, modular GenAI teaching units in GUIDE offer a practical path to integrating cutting-edge AI tools into digital design education.

W. Xiao, Weihua Xiao, Jason Blocklove +7

Code Generation & Program Synthesis Open-Source Models & Weights

Ruhr University Bochum2w ago·also GMV Spain, NEC Laboratories Europe

On Securing the Software Development Lifecycle in IoT RISC-V Trusted Execution Environments

Secure enclave updates and migrations, previously missing from RISC-V TEEs, are now practical thanks to a novel toolkit that adds minimal overhead.

Annika Wilde, Samira Briongos, Claudio Soriente +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Tsinghua AI2w ago

CodeT5-RNN: Reinforcing Contextual Embeddings for Enhanced Code Comprehension

LLMs struggle with code comprehension, but a simple RNN pass over their embeddings can boost accuracy by over 5%.

Md Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe +3

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Dalhousie University2w ago

CodeGreen: Towards Improving Precision and Portability in Software Energy Measurement

Finally, a software energy profiler achieves both high accuracy and cross-platform portability, enabling practical algorithmic energy optimization across diverse languages and hardware.

Saurabhsingh Rajput, Tushar Sharma

Code Generation & Program Synthesis Distributed Systems & Hardware Training Efficiency & Optimization

2w ago·also PolyU

AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse

Forget prompt engineering: AgentFactory lets LLM agents self-evolve by accumulating and refining executable Python subagents, making task re-execution more reliable and efficient.

Zhang Zhang, Shuqi Lu, Hongjin Qian +2

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

2w ago·also CUHK, SenseTime, UT Dallas

FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair

Turning past programming failures into reusable knowledge boosts automated repair performance by 3.7% on a multimodal benchmark.

Ruize Ma, Shilin Zhang, Zheng Ma +7

Code Generation & Program Synthesis Multimodal Models

2w ago·also Monash, Sydney

Revisiting Vulnerability Patch Identification on Data in the Wild

Security patch detectors trained on standard vulnerability databases are practically useless in the real world, losing up to 90% F1-score when deployed on in-the-wild data.

I. Irsan, Ratnadira Widyasari, Ting Zhang +7

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)2w ago·also LLNL

Automated Grammar-based Algebraic Multigrid Design With Evolutionary Algorithms

Genetic programming can discover unconventional multigrid cycles that outperform hand-tuned methods, suggesting automated algorithm design can unlock untapped performance in classical numerical solvers.

Dinesh Parthasarathy, Wayne Mitchell, Arjun Gambhir +2

Code Generation & Program Synthesis Scientific Discovery & Drug Design Training Efficiency & Optimization

2w ago

Federated Computing as Code (FCaC): Sovereignty-aware Systems by Design

Federated Computing as Code lets you enforce data sovereignty in federated systems with cryptographic guarantees, moving beyond runtime policies and trust assumptions.

Enzo Fenoglio, Enzo Fenoglio, P. Treleaven +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

CMU ML2w ago·also INSA Rennes

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.

Lintang Sutawika, Aditya Bharat Soni, R. BharathSriraamR +11

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Tsinghua AI2w ago

VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation

LLMs can now generate Verilog code that's not just correct, but also optimized for real-world hardware constraints like power, performance, and area, thanks to a novel multi-agent system with evolving memory.

Yaoxiang Wang, Qiaolin Shi, Qi Shi +8

Code Generation & Program Synthesis Tool Use & Agents

Mar 17, 2026

2w ago

Nonstandard Errors in AI Agents

Even when given identical data and research questions, autonomous AI coding agents exhibit surprisingly high variability in their empirical findings, raising concerns about the reliability of AI-driven research.

Ruijiang Gao, Steven Chong Xiao

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Tobias Cloosters +52w ago·also Ruhr University Bochum

SseRex: Practical Symbolic Execution of Solana Smart Contracts

SseRex finds hundreds of potential bugs in Solana smart contracts that existing tools miss, revealing that subtle, easily overlooked issues are often the root cause of severe exploits.

Tobias Cloosters, Pascal Winkler, Jens-Rene Giesen +3

Code Generation & Program Synthesis

2w ago·also B and, Meituan

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Even without pre-loaded database schemas, a new RL agent matches or beats state-of-the-art text-to-SQL models that have full schema access.

Ai Jian, Wanrou Du, Jingqing Ruan +2

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Jose Manuel Sanchez +82w ago

Reasoning About Variability Models Through Network Analysis

Feature models, often treated as static configuration spaces, reveal hidden structural patterns and domain-specific deviations when viewed through the lens of network analysis.

Jose Manuel Sanchez, José Sánchez, M. Á. Olivero +6

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

2w ago·also CUHK

From Natural Language to Executable Option Strategies via Large Language Models

LLMs can now reliably translate natural language into executable option trading strategies, thanks to a new domain-specific language that constrains their output to verifiable semantic parses.

Haochen Luo, Zhengzhao Lai, Junjie Xu +2

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Tyler J. Kovach +72w ago

FAlCon: A unified framework for algorithmic control of quantum dot devices

Finally, a unified software framework promises to tame the wild west of quantum dot device tuning, enabling researchers to share and adapt characterization routines across labs.

Tyler J. Kovach, Daniel Schug, Zach D. Merino +5

Code Generation & Program Synthesis Robotics & Embodied AI Scientific Discovery & Drug Design

S. University2w ago·also NJU, Z. University

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Software traceability research is severely imbalanced, with code-related links dominating and 95% of tools stuck in academia.

Zhifei Chen, Lata Yi, Liming Nie +9

Code Generation & Program Synthesis Natural Language Processing

Peng Zhang2w ago

RepoReviewer: A Local-First Multi-Agent Architecture for Repository-Level Code Review

RepoReviewer tackles the complexity of repository-level code review with a multi-agent architecture, breaking down the monolithic process into manageable stages for more relevant and efficient feedback.

Peng ZhangCode

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Dalhousie University2w ago

Energy Flow Graph: Modeling Software Energy Consumption

Software energy consumption isn't just an aggregate number – it's a path-dependent journey, and this new model reveals hidden optimization opportunities that can slash energy use by up to 705x.

Saurabhsingh Rajput, Tushar Sharma

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Editor2w ago·also ECFA Early-Career Researchers Panel, Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences, Institut de Física d’Altes Energies, UGent +3

Results of the analysis of a survey for young scientists on training quality in HEP instrumentation software and machine learning

Early-career researchers in experimental physics report significant gaps in training for software and machine learning tools crucial to their work, highlighting a critical need for improved educational resources.

Cecilia Borca, C. Borca, J. J. Pena +9

Code Generation & Program Synthesis Open-Source Models & Weights Scientific Discovery & Drug Design

Prashanth Vijayaraghavan +72w ago

SYMDIREC: A Neuro-Symbolic Divide-Retrieve-Conquer Framework for Enhanced RTL Synthesis and Summarization

Symbolic planning unlocks significant gains in RTL synthesis and summarization, boosting LLM performance by 20% without fine-tuning.

Prashanth Vijayaraghavan, Apoorva Nitsure, Luyao Shi +5

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Guanghui Zhao +42w ago

Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMs

Forget generic code generation – this work shows that structure-aware retrieval of domain-specific examples slashes the debugging needed to get LLMs to produce working scientific visualization pipelines.

Guanghui Zhao, Zhe Wang, Yu Dong +2

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Che-Ming Chang +62w ago

CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization

A multi-agent system that mimics rubber-duck debugging slashes critical path delay by 25% and power consumption by 22% in RTL code, outperforming LLM-based baselines.

Che-Ming Chang, Prashanth Vijayaraghavan, A. Jadhav +4

Code Generation & Program Synthesis Tool Use & Agents

Shalini Chakraborty +22w ago

Prompts Blend Requirements and Solutions: From Intent to Implementation

Prompts aren't just instructions; they're evolving requirement artifacts that blend intent and implementation, demanding a new approach to software engineering.

Shalini Chakraborty, Jan-Philipp Steghofer, Jan-Philipp Steghöfer

Code Generation & Program Synthesis Natural Language Processing

2w ago

When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents

Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.

Lu Yan, Xuan Chen, Xiangyu Zhang

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mikkel Bengtson Albrechtsen +42w ago

GitOps for Capture the Flag Platforms

GitOps can transform CTF management, enabling automated deployments, enhanced collaboration, and cost-effective scaling.

Mikkel Bengtson Albrechtsen, J. Mauro, Jacopo Mauro +2

Code Generation & Program Synthesis Distributed Systems & Hardware

2w ago

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

CodeScan achieves 97%+ accuracy in detecting data poisoning attacks in code generation LLMs by identifying structural similarities across generations, even when semantics are expressed in diverse syntactic forms.

Shenao Yan, Shimaa Ahmed, Shan Jin +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Microsoft Research2w ago

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

AI-generated code's fluency masks a critical flaw: it often fails to deliver what users actually intend, highlighting the urgent need for "intent formalization" to bridge the gap between informal requirements and precise program behavior.

Shuvendu K. Lahiri

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

University of Calgary2w ago·also Radboud, UCL, UFRPE

LLM Use, Cheating, and Academic Integrity in Software Engineering Education

Software engineering students are most likely to misuse LLMs on programming assignments and documentation, especially when they feel squeezed for time or lack clear guidance.

Ronnie de Souza Santos, Ítalo Santos, M. Bento +3

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Jahangirnagar University2w ago·also Case Western, Missouri University of Science and Technology

Improving Code Comprehension through Cognitive-Load Aware Automated Refactoring for Novice Programmers

Novice programmers can boost code comprehension by 31% thanks to automated refactoring that minimizes cognitive load.

Subarna Saha, Alif Al Hasan, Fariha Tanjim Shifat +1

Code Generation & Program Synthesis Natural Language Processing

2w ago·also Fudan, NJU, Shanghai AI Lab

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

By explicitly exposing the model's reasoning process during SVG generation, CTRL-S achieves higher task success rates, superior SVG code quality, and exceptional visual fidelity compared to existing methods.

Haomin Wang, Qianli Ma, Shengyuan Ding +2

Code Generation & Program Synthesis Multimodal Models Reasoning & Chain-of-Thought

MetaX2w ago

IQuest-Coder-V1 Technical Report

Code LLMs can achieve SOTA performance in agentic tasks by explicitly modeling the dynamic evolution of software logic across different training stages.

Jian Yang, Wei Zhang, Shawn Guo +44

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Open-Source Models & Weights

Vincent Gurgul +52w ago

The State of Generative AI in Software Development: Insights from Literature and a Developer Survey

GenAI is already halving the time developers spend on boilerplate and documentation, but its real potential lies in shifting focus from routine coding to higher-level tasks like specification quality and architectural reasoning.

Vincent Gurgul, Vincent Gurgul, R. Gubela +3

Code Generation & Program Synthesis Natural Language Processing

Matthijs Jansen op de Haar +32w ago

Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs

Open-source LLMs can grade UML diagrams with near-human accuracy on individual criteria, paving the way for AI-assisted teaching without relying on proprietary models.

Matthijs Jansen op de Haar, N. Bouali, Nacir Bouali +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Open-Source Models & Weights

2w ago

Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling

A new rank-based phenotypic characterization scheme slashes the computational cost of genetic programming for dynamic project scheduling, enabling faster discovery of high-quality heuristic rules.

Yuan Tian, Mengjie Zhang

Code Generation & Program Synthesis Scientific Discovery & Drug Design Training Efficiency & Optimization

2w ago·also Beijing Value Simplex Technology Co. Ltd, University of Electronic Science and Technology, Yangtze Delta Research Institute

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

LLMs can now write better quantitative trading algorithms than humans, thanks to a new framework that turns unstructured financial reports into executable code.

Qinhong Lin, Ruitao Feng, Yinglun Feng +6

Code Generation & Program Synthesis Interpretability & Mechanistic Interp

2w ago·also CAS, KCL

TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation

Even when LLMs translate code correctly, over 20% of the time it's surprisingly inefficient due to algorithmic flaws, poor language choices, or resource mismanagement.

Zhihao Gong, Zeyu Sun, Qingyuan Liang +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

Kabilan Mahathevan2w ago

Dialect-Agnostic SQL Parsing via LLM-Based Segmentation

LLMs can parse almost any SQL dialect by segmenting queries into clauses and expressions, then validating with grammar-based methods.

Kabilan Mahathevan

Code Generation & Program Synthesis Natural Language Processing

MetaX2w ago·also NJU, USTC

InCoder-32B: Code Foundation Model for Industrial Scenarios

A new 32B code LLM trained specifically for industrial tasks crushes existing models on specialized domains like chip design and GPU kernel optimization, while remaining competitive on general coding benchmarks.

Jian Yang, Wei Zhang, Jiajun Wu +30

Code Generation & Program Synthesis Distributed Systems & Hardware Open-Source Models & Weights

2w ago·also NTU, University of London

SynthChain: A Synthetic Benchmark and Forensic Analysis of Advanced and Stealthy Software Supply Chain Attacks

Current telemetry falls woefully short in detecting advanced software supply chain attacks, with even the best single source capturing less than 40% of the attack chain, underscoring the critical need for multi-source data fusion.

Zhuoran Tan, Wenbo Guo, Taylor Brierley +4

Code Generation & Program Synthesis Data Curation & Synthetic Data Eval Frameworks & Benchmarks

I. S. W. B. Prasetya +22w ago

Talk is Cheap, Logic is Hard: Benchmarking LLMs on Post-Condition Formalization

LLMs struggle to formalize program post-conditions from natural language, with even the best models failing to correctly formalize all tasks, highlighting a critical gap in their ability to bridge natural language understanding and formal verification.

I. S. W. B. Prasetya, Fitsum Meshesha Kifetew, Davide Prandi

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Kelechi G. Kalu +82w ago

A Longitudinal Study of Usability in Identity-Based Software Signing

Identity-based software signing may reduce key management burdens, but it relocates complexity to verification, configuration, and deployment, creating new usability challenges.

Kelechi G. Kalu, Kelechi G. Kalu, Hieu Tran +6

Code Generation & Program Synthesis Open-Source Models & Weights Tool Use & Agents

Houston Haynes +12w ago

Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation

Imagine a compiler that understands the size and lifetime of your data so well it can automatically optimize memory allocation and representation, giving you design-time insights into performance.

Houston Haynes, H. Haynes

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Florian Holzbauer +62w ago

Malicious Or Not: Adding Repository Context to Agent Skill Classification

Security scanners flag nearly half of AI agent skills as malicious, but adding GitHub repository context reveals that the true number is closer to 0.5%.

Florian Holzbauer, David Schmidt, G. Gegenhuber +4

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness Tool Use & Agents

2w ago·also Corresponding Author, Waterloo

SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding

A Qwen3-8B model, trained with a new SFT+RLAIF recipe on a challenging new benchmark, SWE-QA-Pro, beats GPT-4o in repository-level code understanding.

Songcheng Cai, Z. Lyu, Yuansheng Ni +14

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Maria Fernanda Oiveira Guimaraes +62w ago·also Cadence, Independent Researcher

Vectorization of Verilog Designs and its Effects on Verification and Synthesis

Vectorizing Verilog designs slashes memory consumption by over 50% in formal verification, even without changing the underlying hardware.

Maria Fernanda Oiveira Guimaraes, U. Rosa, Ian Trudel +4

Code Generation & Program Synthesis Distributed Systems & Hardware

Abhijit Kumar +22w ago

Execution-Grounded Credit Assignment for GRPO in Code Generation

Pinpointing the exact line of code causing a test failure boosts code generation performance by 3%, without needing a critic or extra training.

Abhijit Kumar, Natalya Kumar, Shikhar Gupta

Code Generation & Program Synthesis Eval Frameworks & Benchmarks RLHF & Preference Learning

Luís Freire +22w ago

Exploring different approaches to customize language models for domain-specific text-to-code generation

LoRA fine-tuning beats prompting and RAG for adapting smaller language models to domain-specific code generation tasks, offering a path to higher accuracy and domain alignment.

Luís Freire, Fernanda A. Andaló, Nicki Skafte Detlefsen

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Mar 16, 2026

2w ago

VibeContract: The Missing Quality Assurance Piece in Vibe Coding

LLM-generated code, while fast, is often subtly wrong, and VibeContract offers a way to make "vibe coding" more predictable and trustworthy by adding explicit, verifiable contracts.

Song Wang

Code Generation & Program Synthesis Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Tsinghua AI2w ago·also AI Chip Center for Emerging Smart, HKUST, UMacau

Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems

Scaling LLM-based multi-agent systems doesn't just need better prompts or models, but a whole new software engineering approach focused on managing runtime entropy.

Weihao Zhang, Yitong Zhou, Huanyu Qu +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Luis Cambelo +42w ago

A Comparative Analysis of Backbone Algorithms for Configurable Software Systems

Algorithms used in product line tools are only half as fast as the best alternative for finding backbones in configurable software systems, but picking the right algorithm requires predicting the optimal chunk size, which remains an open problem.

Luis Cambelo, R. Heradio, J. Horcas +2

Code Generation & Program Synthesis

Leif Andersen +42w ago

Mixing Visual and Textual Code

Hybrid ClojureScript lets you visually code geometric ideas directly within your text, opening up new possibilities for domain-specific language design.

Leif Andersen, Mike Ballantyne, Cameron Moy +2

Code Generation & Program Synthesis Multimodal Models

2w ago

Reproducible Orchestration of Best Practices for Reaction Path Optimization with the Nudged Elastic Band

Say goodbye to ad-hoc scripts: this automated workflow slashes manual intervention in NEB calculations, ensuring reproducible reaction path optimization across platforms.

Rohit Goswami, Rohit Goswami Institute Imx, Lab-COSMO +4

Code Generation & Program Synthesis Open-Source Models & Weights Scientific Discovery & Drug Design

Vasily Ilin2w ago

Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium

AI can now semi-autonomously formalize complex mathematical theorems like the Vlasov-Maxwell-Landau equilibrium, even outpacing traditional mathematical research.

Vasily Ilin

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

Andrea Bombarda +52w ago·also University of Bergamo

Formalizing and validating properties in Asmeta with Large Language Models (Extended Abstract)

LLMs can now assist in the challenging task of writing temporal logic properties for model-based development, potentially streamlining the formal specification process.

Andrea Bombarda, A. Bombarda, Silvia Bonfanti +3

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Tsinghua AI2w ago·also Beihang, BIT, NJU, Proxseer Inc.

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

LLMs struggle to effectively use private library APIs even when provided with the correct documentation, but PriCoder can boost their performance by over 20% through targeted training data synthesis.

Yitong Zhang, Chengze Li, Ruize Chen +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mengdi Li +42w ago

TriFusion-LLM: Prior-Guided Multimodal Fusion with LLM Arbitration for Fine-grained Code Clone Detection

LLMs can boost code clone detection accuracy by selectively arbitrating only 0.2% of uncertain cases flagged by a multimodal fusion model, achieving a 0.3% absolute Macro-F1 gain.

Mengdi Li, Yuming Liu, He Wang +2

Code Generation & Program Synthesis Multimodal Models Natural Language Processing

Atharva Sehgal +62w ago

Evaluating Agentic Optimization on Large Codebases

LLM coding agents still fall short when optimizing real-world codebases, especially when balancing multiple objectives like performance and correctness, as revealed by the new FormulaCode benchmark.

Atharva Sehgal, James Hou, Akanksha Sarkar +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

2w ago

Human-AI Synergy in Agentic Code Review

AI code review agents may scale defect screening, but their suggestions are adopted less often and, when adopted, can actually *worsen* code quality, underscoring the critical need for human oversight.

Suzhen Zhong, Shayan Noei, Ying Zou +1

Code Generation & Program Synthesis Tool Use & Agents

Gal Bakal +12w ago

Knowledge Activation: AI Skills as the Institutional Knowledge Primitive for Agentic Software Development

Unlock agentic software development by transforming institutional knowledge into actionable, AI-consumable Atomic Knowledge Units, enabling agents to perform tasks correctly without needing to reconstruct organizational context.

Gal Bakal, G. Bakal

Code Generation & Program Synthesis Tool Use & Agents

Swadesh Jana +52w ago

GASP: Guided Asymmetric Self-Play For Coding LLMs

By strategically guiding self-play with challenging real-world examples, GASP unlocks a 2.5% performance boost in coding LLMs and conquers previously unsolvable problems.

Swadesh Jana, Cansu Sancaktar, Tom'avs Danivs +3

Code Generation & Program Synthesis Data Curation & Synthetic Data Training Efficiency & Optimization

2w ago·also A*STAR

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

LLMs can generate syntactically correct tests, but their ability to *reason* about code faults is surprisingly poor, hindering autonomous debugging.

Srijan Bansal, Fangkai Jiao, Yilun Zhou +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Ivan Stetsenko +12w ago

Lore: Repurposing Git Commit Messages as a Structured Knowledge Protocol for AI Coding Agents

Stop throwing away valuable context: Lore lets you turn ordinary Git commits into structured knowledge troves for AI coding agents, capturing the "Decision Shadow" behind every code change.

Ivan Stetsenko, I. Stetsenko

Code Generation & Program Synthesis Natural Language Processing Tool Use & Agents

Ido Pinto +42w ago

Not All Invariants Are Equal: Curating Training Data to Accelerate Program Verification with SLMs

Forget massive models: fine-tuning a small 4B parameter model on carefully curated data can match or even approach the performance of 100B+ parameter LLMs in program verification tasks.

Ido Pinto, Yizhak Yisrael Elboher, Haoze Wu +2

Code Generation & Program Synthesis Data Curation & Synthetic Data Training Efficiency & Optimization

Jongwook Si +12w ago

Generation of Programming Exam Question and Answer Using ChatGPT Based on Prompt Engineering

Forget tedious exam creation: ChatGPT, guided by prompt engineering, can generate programming questions as good as (or better than) human-crafted ones.

Jongwook Si, Sungyoung Kim

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

2w ago

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

Most "agent skills" hyped for boosting LLMs in software engineering provide almost no benefit in real-world tasks, with 80% yielding zero pass-rate improvement.

Tingxu Han, Yi Zhang, Wei Song +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Ha-Thanh Nguyen +12w ago

PYTHEN: A Flexible Framework for Legal Reasoning in Python

Pythonistas can now easily formalize legal reasoning thanks to PYTHEN, a new framework that brings the power of PROLEG-style defeasible logic to the Python ecosystem.

Ha-Thanh Nguyen, Ken Satoh

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

2w ago·also Cambridge

PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent

Achieve near-perfect decompilation-to-compilation by hot-swapping LLM-repaired functions into original binaries and using runtime feedback to eliminate semantic hallucinations.

Yuxin Cui, Zeyu Gao, Shuxian He +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks

Ruhr University Bochum2w ago·also KIT, TU Braunschweig, University of Cologne

The Impact of AI-Assisted Development on Software Security: A Study of Gemini and Developer Experience

Despite the promise of AI-powered tools, developer experience still trumps AI assistance when it comes to writing secure code.

Nadine Jost, B. Berens, Benjamin Berens +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Huiyun Peng +62w ago

Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization

LLMs can now orchestrate system-wide optimizations across microservices, boosting throughput by 36% and slashing response times by 27%.

Huiyun Peng, Parth V. Patil, Parth Vinod Patil +4

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

2w ago·also FLock.io, HKUST

Visual Set Program Synthesizer

End-to-end MLLMs struggle with visual reasoning, but a program synthesis approach that explicitly represents compositional logic dramatically improves accuracy and transparency.

Zehua Cheng, Wei Dai, Wenhu Zhang +2

Code Generation & Program Synthesis Computer Vision Multimodal Models

Aozhe Wang +52w ago

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

By adversarially co-evolving code and test LLMs, Code-A1 achieves code generation performance on par with human-annotated training, while simultaneously boosting the LLM's ability to find bugs.

Aozhe Wang, Nan Zhou, Zhengxi Lu +3

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness RLHF & Preference Learning

College of Computer Science and Software Engineering2w ago·also Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)

SEMAG: Self-Evolutionary Multi-Agent Code Generation

Forget fixed workflows: SEMAG's self-evolving agents dynamically adapt their coding process and even upgrade their backbone LLM, leading to state-of-the-art code generation performance.

Yulin Peng, Haowen Hou, Xinxin Zhu +2

Code Generation & Program Synthesis Tool Use & Agents

2w ago·also University of Tennessee

Self-Admitted Technical Debt in Scientific Software: Prioritization, Sentiment, and Propagation Across Artifacts

Scientific software projects struggle to resolve high-priority technical debt that propagates across multiple code artifacts, suggesting a need for better tooling and monitoring.

Eric L. Melin, Nasir U. Eisty, Gregory R. Watson +1

Code Generation & Program Synthesis Natural Language Processing Scientific Discovery & Drug Design

2w ago

Test Code Review in the Era of GitHub Actions: A Replication Study

The rise of GitHub Actions is inadvertently marginalizing test code review, with PRs involving tests often receiving no reviews or comments post-GHA adoption.

Hui Sun, Yinan Wu, Wesley K. G. Assunccao +1

Code Generation & Program Synthesis Natural Language Processing

E. Tempero +12w ago

Making Software Metrics Useful

Software metrics are failing engineers because they're built on shaky measurement science, not real-world decisions.

E. Tempero, Paul Ralph

Code Generation & Program Synthesis

Mar 15, 2026

Meta AI2w ago

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

LLMs can be guided to discover better solutions in open-ended scientific tasks by identifying and reasoning about causal factors that influence the evolutionary process.

Yongqiang Chen, Chenxi Liu, Zhenhao Chen +2

Code Generation & Program Synthesis Scientific Discovery & Drug Design Tool Use & Agents

2w ago·also Concordia University

Mining the YARA Ecosystem: From Ad-Hoc Sharing to Data-Driven Threat Intelligence

Despite high static quality scores, YARA rules in the wild suffer from significant noise, low recall, and a bias towards legacy threats, exposing a "double penalty" for defenders.

Dectot--Le Monnier de Gouville Esteban, Mohammad Hamdaqa, Moataz Chouchen

Code Generation & Program Synthesis Natural Language Processing Open-Source Models & Weights

Balaji Rao +42w ago

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

LLMs can now be tested on their ability to formally verify real-world cryptographic assembly code, not just competition math problems.

Balaji Rao, John Harrison, Soonho Kong +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Tobias Geger +22w ago

Bots and Blocks: Presenting a project-based approach for robotics education

Hands-on robotics education gets a boost from a project-based learning framework that teaches students ROS through the automation of build block disassembly.

Tobias Geger, Dominique Briechle, Andreas Rausch

Code Generation & Program Synthesis Robotics & Embodied AI

Simone Faro +22w ago

Reversible Lifetime Semantics for Quantum Programs

Quantum programs can be optimized by automatically uncomputing intermediate values based on their semantic lifetime, leading to reduced circuit depth and qubit usage.

Simone Faro, Francesco Pio Marino, Gabriele Messina

Code Generation & Program Synthesis

Wu Ji2w ago

Trust Over Fear: How Motivation Framing in System Prompts Affects AI Agent Debugging Depth

Turns out, AI agents debug better when you ask nicely: trust-based prompts boost debugging depth by 59% compared to fear-based approaches, which show no improvement over baseline.

Wu Ji

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Vehid Geruslu +52w ago

ISTQB Certifications Under the Lens: Their Contributions to the Software-Testing Profession; and AI-assisted Synthesis of Practitioners' Endorsements and Criticisms

Despite being the most widely recognized testing qualifications, ISTQB certifications are under fire for being too theoretical and not keeping pace with agile and automation, raising questions about their real-world value.

Vehid Geruslu, Alper Buğra Keleş, Sevde Değirmenci +3

Code Generation & Program Synthesis Natural Language Processing