Search papers, labs, and topics across Lattice.
This paper introduces the Back to the Familiar Future (B2FF) framework, which enhances vision-language-action (VLA) policies' recovery from deviations during manipulation tasks. By generating a bank of pre-imagined familiar future states conditioned on initial observations, B2FF allows for the selection of a recovery milestone that serves as a stable visual goal during off-trajectory scenarios. Experimental results on the failure-injected LIBERO dataset show a significant increase in success rates from 56.3% to 74.0%, highlighting the effectiveness of milestone-based recovery without the need for low-level action generator fine-tuning.
Pre-imagined milestones can boost VLA recovery success rates by over 30% without requiring low-level action adjustments.
Vision-language-action (VLA) policies can deviate from nominal trajectories during manipulation, even when tasks remain physically feasible. Recovering from these deviations is challenging, as they push the policy into unfamiliar state spaces where direct re-planning frequently destabilizes action sequences. We propose Back to the Familiar Future (B2FF), a recovery framework for foresight-driven VLAs that leverages future visual conditioning as a recovery interface. Before execution, the VLA generates a milestone bank of familiar future states conditioned on the clean initial observation. At recovery time, a recoverability-aware selector selects a recovery milestone from this bank and enforces it as a fixed visual goal. This enables the VLA to robustly map off-trajectory observations back to a familiar future. On failure-injected LIBERO, under controlled recovery timing aligned with the injected failure, B2FF increases the average success rate of a baseline VLA from 56.3% to 74.0%, demonstrating that pre-imagined milestones can guide recovery without fine-tuning the low-level action generator.