Search papers, labs, and topics across Lattice.
This paper introduces CHIMERA, a novel protocol-aware recovery framework designed to enhance the safety and availability of confidential Byzantine Fault-Tolerant (BFT) consensus systems using Trusted Execution Environments (TEEs). By systematically categorizing existing rollback-resilient recovery methods and identifying their limitations, the authors propose a differentiated approach that tailors recovery mechanisms based on the unique characteristics of persistent states. Experimental results demonstrate that CHIMERA significantly outperforms existing methods, achieving higher throughput and lower recovery latency in both LAN and WAN environments.
CHIMERA revolutionizes BFT consensus recovery by tailoring rollback protection to the specific characteristics of persistent states, resulting in unprecedented throughput and availability.
Trusted Execution Environments (TEEs) have enabled confidential Byzantine Fault-Tolerant (BFT) consensus systems with confidentiality and improved scalability. However, TEEs do not provide state continuity: during recovery, a compromised host can roll back a crashed enclave to a stale persistent state, significantly threatening both safety and availability. Existing defenses face a fundamental tradeoff: they either impose substantial overhead on critical consensus paths, reducing throughput and increasing latency, or incur prolonged recovery delays, hurting availability. We present the first systematic taxonomy of rollback-resilient recovery for confidential BFT consensus, distilling prior approaches into four categories. We further expose their inherent limitations. Guided by this detailed analysis, we design CHIMERA, a protocol-aware recovery framework that breaks this tradeoff. Our key insight is that rollback protection in consensus systems should not be uniform. Different types of persistent states differ fundamentally in their state distribution, update behavior, and representation form. CHIMERA separates persistent state into metadata and logs according to these protocol-level properties and applies distinct recovery mechanisms to each type. We formally model CHIMERA in Maude and verify its safety and liveness properties. We implement it on Braft and ZooKeeper using Intel TDX, and evaluate it in both LAN and WAN settings. Results show that CHIMERA achieves higher throughput, lower recovery latency, and better availability than state-of-the-art rollback-resilient baselines.