Search papers, labs, and topics across Lattice.
This paper introduces AnchorWorld, a novel framework for egocentric world simulation that enhances interaction integrity and allows for flexible world customization. By leveraging 3D human motion and incorporating auxiliary training supervision with exogenous viewpoints, the model achieves robust spatial grounding of human-world interactions. Experimental results demonstrate that AnchorWorld outperforms existing state-of-the-art methods while maintaining spatio-temporal geometric consistency in its customizable environments.
AnchorWorld's innovative use of 3D human motion and exogenous viewpoints enables a new level of interaction fidelity in egocentric simulations, setting a new benchmark in the field.
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.