Search papers, labs, and topics across Lattice.
This paper introduces Loop-OWM, an object-centric world-modeling architecture designed to learn composable transitions that represent rules in the ARC benchmark, which tests in-context rule induction through visual-symbolic representations. By integrating color-prototype slots, demonstration-conditioned task summaries, and a looped transition model, Loop-OWM achieves superior performance on ARC-1 and ARC-2 compared to both non-looped and looped baselines, while maintaining a comparable or reduced parameter count. These findings highlight the potential for visual-symbolic learning of rules, moving beyond traditional language or symbolic program approaches.
Learning rules as visual-symbolic transitions rather than just language descriptions could revolutionize how we approach in-context reasoning in AI.
ARC tests in-context rule induction: given a few input-output demonstrations, a model must infer the hidden rule and apply it to a new query. While many approaches express ARC rules through language, code, or symbolic programs, ARC itself is visual-symbolic: rules appear as grid transitions over objects, colors, shapes, and spatial relations. We introduce Loop-OWM, an object-centric world-modeling architecture that learns these rules as composable transitions over structured states. It combines color-prototype slots, demonstration-conditioned task summaries, and a looped transition model with dense propagation and slot-conditioned correction. On both ARC-1 and ARC-2, Loop-OWM outperforms non-looped and looped baselines with comparable or fewer parameters. These results suggest that ARC rules can be learned not only as language descriptions or searched programs, but also as transitions over visual-symbolic world states.