Search papers, labs, and topics across Lattice.
The paper introduces RynnBrain, a family of open-source spatiotemporal foundation models (2B, 8B, 30B-A3B MoE) designed to unify perception, reasoning, and planning for embodied intelligence. RynnBrain enhances egocentric understanding, spatiotemporal localization, physically grounded reasoning, and physics-aware planning within a single framework. Evaluations across 20 embodied benchmarks and 8 general vision benchmarks demonstrate that RynnBrain significantly outperforms existing embodied foundation models, particularly in physically grounded reasoning and adaptation to diverse embodied tasks.
RynnBrain leapfrogs existing embodied foundation models, offering a unified, open-source spatiotemporal model that excels at physically grounded reasoning and planning across a wide range of benchmarks.
Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.