Search papers, labs, and topics across Lattice.
The paper introduces a "3+1" heterogeneous multi-agent architecture for code vulnerability detection, leveraging three cloud-based LLM experts (DeepSeek-V3) analyzing code from different perspectives in parallel, coupled with a local lightweight verifier (Qwen3-8B) for adversarial validation. This architecture is formalized using a two-layer game framework to capture cooperative expert analysis and adversarial verification incentives. Experiments on the NIST Juliet Test Suite show the approach achieves a 77.2% F1 score at $0.002 per sample, outperforming single-expert baselines and static analysis tools due to improved precision from the verifier and speedup from parallel execution.
A game-theory-inspired ensemble of LLMs and a lightweight verifier slashes the cost of code vulnerability detection while boosting accuracy, proving that strategic agent design can beat brute-force scaling.
Automated code vulnerability detection is critical for software security, yet existing approaches face a fundamental trade-off between detection accuracy and computational cost. We propose a heterogeneous multi-agent architecture inspired by game-theoretic principles, combining cloud-based LLM experts with a local lightweight verifier. Our"3+1"architecture deploys three cloud-based expert agents (DeepSeek-V3) that analyze code from complementary perspectives - code structure, security patterns, and debugging logic - in parallel, while a local verifier (Qwen3-8B) performs adversarial validation at zero marginal cost. We formalize this design through a two-layer game framework: (1) a cooperative game among experts capturing super-additive value from diverse perspectives, and (2) an adversarial verification game modeling quality assurance incentives. Experiments on 262 real samples from the NIST Juliet Test Suite across 14 CWE types, with balanced vulnerable and benign classes, demonstrate that our approach achieves a 77.2% F1 score with 62.9% precision and 100% recall at $0.002 per sample - outperforming both a single-expert LLM baseline (F1 71.4%) and Cppcheck static analysis (MCC 0). The adversarial verifier significantly improves precision (+10.3 percentage points, p<1e-6, McNemar's test) by filtering false positives, while parallel execution achieves a 3.0x speedup. Our work demonstrates that game-theoretic design principles can guide effective heterogeneous multi-agent architectures for cost-sensitive software engineering tasks.