May 28, 2026arXiv:2605.29910

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

Xiang Liu, Saozhong Song, Zhaowei Zhang, Huiying Lan, Jason Zeng, Ming Wu, Michael Heinrich, Yong Sun, Ceyao Zhang

AI Summary

Agora, a domain-aware multi-agent framework, was developed to detect protocol-level logic bugs in consensus implementations by integrating hypothesis-driven testing with LLMs. Specialized agents collaboratively explore protocol state spaces, synthesize attack scenarios using domain-specific constraints, and iteratively refine findings. Evaluated on Raft, EPaxos, HotStuff, and BullShark, Agora discovered 15 previously unknown safety violations, outperforming existing LLM-based agents that failed to detect any.

Key Contribution

LLMs can find deep bugs in distributed consensus protocols, but only if you give them a team of specialized, collaborating agents that understand the domain.

Abstract

Consensus protocols form the backbone of distributed systems and blockchains, where implementation bugs can cause data corruption and financial losses. While LLM-based approaches show promise in code analysis, they struggle with deep protocol-level logic bugs involving complex state-dependent behaviors across multiple execution stages. We present Agora, a domain-aware multi-agent framework that integrates hypothesis-driven testing with LLM capabilities for systematic protocol verification. Agora employs specialized agents that collaboratively explore protocol state spaces, synthesize attack scenarios using domain-specific constraints, and validate findings through iterative refinement. This explicit role separation enables reasoning about global protocol invariants beyond single-function code analysis. We evaluate Agora on four consensus implementations (Raft, EPaxos, HotStuff, BullShark) using four state-of-the-art LLMs. Agora discovers 15 previously unknown protocol-level logic bugs that violate safety properties, while existing LLM-based agents fail to detect any such protocol-level logic bugs. Our results demonstrate that domain-aware multi-agent collaboration is essential for detecting deep logic bugs in complex protocols.

Code Generation & Program Synthesis Distributed Systems & Hardware Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References64

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

Related Papers