Search papers, labs, and topics across Lattice.
Learn2Fold is introduced, a neuro-symbolic framework for generating origami folding sequences from text by combining language models with a graph-structured world model. The system decouples semantic proposal (LLM) from physical verification (world model) to overcome limitations of prior optimization-based and generative approaches. Results show Learn2Fold can generate physically valid folding sequences for complex and out-of-distribution patterns by using the world model as a differentiable surrogate simulator within a lookahead planning loop.
Origami, the "Hello, World!" of physical intelligence, is now tractable: Learn2Fold uses LLMs and graph-structured world models to generate valid folding sequences from text.
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.