BUETNTU TaiwanUIUCFeb 18, 2026arXiv:2602.16671

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Jaid Monwar Chowdhury, Jaid Monwar Chowdhury, Chi-An Fu, Reyhaneh Jabbarvand, Reyhaneh Jabbarvand

AI Summary

The paper introduces SPARC, a neuro-symbolic framework for automated C unit test generation that addresses the limitations of direct intent-to-code synthesis by LLMs. SPARC uses a four-stage process involving CFG analysis, an Operation Map for grounded reasoning, path-targeted test synthesis, and iterative self-correction with compiler feedback. Experiments on 59 subjects demonstrate that SPARC significantly outperforms vanilla LLM prompting and matches or exceeds symbolic execution in coverage and mutation score, while also improving code readability and maintainability.

Key Contribution

LLMs can generate surprisingly effective C unit tests when guided by program structure and constraints, achieving coverage comparable to symbolic execution while producing more readable code.

Abstract

Automated unit test generation for C remains a formidable challenge due to the semantic gap between high-level program intent and the rigid syntactic constraints of pointer arithmetic and manual memory management. While Large Language Models (LLMs) exhibit strong generative capabilities, direct intent-to-code synthesis frequently suffers from the leap-to-code failure mode, where models prematurely emit code without grounding in program structure, constraints, and semantics. This will result in non-compilable tests, hallucinated function signatures, low branch coverage, and semantically irrelevant assertions that cannot properly capture bugs. We introduce SPARC, a neuro-symbolic, scenario-based framework that bridges this gap through four stages: (1) Control Flow Graph (CFG) analysis, (2) an Operation Map that grounds LLM reasoning in validated utility helpers, (3) Path-targeted test synthesis, and (4) an iterative, self-correction validation loop using compiler and runtime feedback. We evaluate SPARC on 59 real-world and algorithmic subjects, where it outperforms the vanilla prompt generation baseline by 31.36% in line coverage, 26.01% in branch coverage, and 20.78% in mutation score, matching or exceeding the symbolic execution tool KLEE on complex subjects. SPARC retains 94.3% of tests through iterative repair and produces code with significantly higher developer-rated readability and maintainability. By aligning LLM reasoning with program structure, SPARC provides a scalable path for industrial-grade testing of legacy C codebases.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Related Papers