Feb 18, 2026arXiv:2602.16511

VIGOR: Visual Goal-In-Context Inference for Unified Humanoid Fall Safety

Osher Azulay, Osher Azulay, Zhengjie Xu, Andrew Scheffer, Andrew Scheffer, Stella X. Yu, Stella X. Yu

AI Summary

The paper introduces VIGOR, a unified approach to humanoid fall safety that spans fall avoidance, impact mitigation, and stand-up recovery. VIGOR trains a privileged teacher model on flat and complex simulated terrains using sparse human demonstrations, then distills this knowledge into a student model that relies on egocentric depth and proprioception. The student learns to react by matching the teacher's goal-in-context latent representation, achieving robust, zero-shot fall safety on a Unitree G1 humanoid in diverse environments.

Key Contribution

Humanoids can now recover from falls in complex environments without real-world training, thanks to a distilled, goal-in-context policy that reasons about both pose and terrain.

Abstract

Reliable fall recovery is critical for humanoids operating in cluttered environments. Unlike quadrupeds or wheeled robots, humanoids experience high-energy impacts, complex whole-body contact, and large viewpoint changes during a fall, making recovery essential for continued operation. Existing methods fragment fall safety into separate problems such as fall avoidance, impact mitigation, and stand-up recovery, or rely on end-to-end policies trained without vision through reinforcement learning or imitation learning, often on flat terrain. At a deeper level, fall safety is treated as monolithic data complexity, coupling pose, dynamics, and terrain and requiring exhaustive coverage, limiting scalability and generalization. We present a unified fall safety approach that spans all phases of fall recovery. It builds on two insights: 1) Natural human fall and recovery poses are highly constrained and transferable from flat to complex terrain through alignment, and 2) Fast whole-body reactions require integrated perceptual-motor representations. We train a privileged teacher using sparse human demonstrations on flat terrain and simulated complex terrains, and distill it into a deployable student that relies only on egocentric depth and proprioception. The student learns how to react by matching the teacher's goal-in-context latent representation, which combines the next target pose with the local terrain, rather than separately encoding what it must perceive and how it must act. Results in simulation and on a real Unitree G1 humanoid demonstrate robust, zero-shot fall safety across diverse non-flat environments without real-world fine-tuning. The project page is available at https://vigor2026.github.io/

Computer Vision Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VIGOR: Visual Goal-In-Context Inference for Unified Humanoid Fall Safety

Related Papers