Search papers, labs, and topics across Lattice.
The paper introduces Causal-JEPA, an object-centric world model that extends masked joint embedding prediction to object representations, forcing interaction reasoning. Object-level masking induces latent interventions with counterfactual-like effects, preventing shortcut solutions and promoting robust relational understanding. Empirically, C-JEPA demonstrates significant improvements in visual question answering, particularly in counterfactual reasoning, and enables more efficient planning in agent control tasks compared to patch-based world models.
Object-level masking in world models unlocks a 20% boost in counterfactual reasoning and drastically reduces planning costs, hinting at a path toward more efficient and robust AI agents.
World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient to capture interaction-dependent dynamics. We therefore propose C-JEPA, a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric representations. By applying object-level masking that requires an object's state to be inferred from other objects, C-JEPA induces latent interventions with counterfactual-like effects and prevents shortcut solutions, making interaction reasoning essential. Empirically, C-JEPA leads to consistent gains in visual question answering, with an absolute improvement of about 20\% in counterfactual reasoning compared to the same architecture without object-level masking. On agent control tasks, C-JEPA enables substantially more efficient planning by using only 1\% of the total latent input features required by patch-based world models, while achieving comparable performance. Finally, we provide a formal analysis demonstrating that object-level masking induces a causal inductive bias via latent interventions. Our code is available at https://github.com/galilai-group/cjepa.