HKUSTTencent AIUSTCApr 20, 2026arXiv:2604.17883

Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Tianfu Wang, Zhezheng Hao, Yin Wu, Wei Wu, Qiang Lin, Hande Dong, Nicholas Jing Yuan, Hui Xiong

AI Summary

The paper introduces "Agentic Consensus," a new paradigm for human-AI coding collaboration where a typed property graph (the consensus layer) replaces code as the primary artifact. This approach aims to address the control failure in current AI-assisted development, where complex system topology is flattened into low-dimensional text, making systems opaque and fragile. By linking evidence directly to structural claims in the consensus layer, the authors enable auditable commitments and explicit under-specification, evaluated through metrics like alignment fidelity and consensus entropy.

Key Contribution

Current AI-assisted coding's "vibe coding" approach, while fast, creates unmaintainable codebases because it collapses complex system topology into un-auditable chat logs.

Abstract

Vibe coding produces correct, executable code at speed, but leaves no record of the structural commitments, dependencies, or evidence behind it. Reviewers cannot determine what invariants were assumed, what changed, or why a regression occurred. This is not a generation failure but a control failure: the dominant artifact of AI-assisted development (code plus chat history) performs dimension collapse, flattening complex system topology into low-dimensional text and making systems opaque and fragile under change. We propose Agentic Consensus: a paradigm in which the consensus layer C, an operable world model represented as a typed property graph, replaces code as the primary artifact of engineering. Executable artifacts are derived from C and kept in correspondence via synchronization operators Phi (realize) and Psi (rehydrate). Evidence links directly to structural claims in C, making every commitment auditable and under-specification explicit as measurable consensus entropy rather than a silent guess. Evaluation must move beyond code correctness toward alignment fidelity, consensus entropy, and intervention distance. We propose benchmark task families designed to measure whether consensus-based workflows reduce human intervention compared to chat-driven baselines.

Code Generation & Program Synthesis Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Related Papers