Search papers, labs, and topics across Lattice.
The paper introduces Crab, a novel host-side runtime system that optimizes checkpoint and restore (C/R) operations for autonomous agents in sandboxed environments. Crab bridges the semantic gap between agent frameworks and the OS by using eBPF to classify the OS-visible effects of each agent turn, enabling intelligent decisions about checkpoint granularity. Experiments on shell-intensive and code-repair workloads demonstrate that Crab achieves 100% recovery correctness while reducing checkpoint traffic by up to 87%, incurring minimal overhead compared to fault-free execution.
Achieve 100% agent recovery correctness with near-zero overhead by intelligently checkpointing only the OS state that actually matters.
Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-OS semantic gap: agent frameworks see tool calls but not their OS effects; the OS sees state changes but lacks turn-level context to judge recovery relevance. This gap hides massive sparsity: over 75% of agent turns produce no recovery-relevant state, so most checkpoints are unnecessary. Crab (Checkpoint-and-Restore for Agent SandBoxes) is a transparent host-side runtime that bridges this gap without modifying agents or C/R backends. An eBPF-based inspector classifies each turn's OS-visible effects to decide checkpoint granularity; a coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time; and a host-scoped engine schedules checkpoint traffic across co-located sandboxes. On shell-intensive and code-repair workloads, Crab raises recovery correctness from 8% (chat-only) to 100%, cuts checkpoint traffic by up to 87%, and stays within 1.9% of fault-free execution time.