Search papers, labs, and topics across Lattice.
Internal world models for prediction, model-based planning, simulation, and environment modeling in AI systems.
#14 of 24
2
Train drone operators in realistic battlefield environments without ever leaving the simulator, thanks to Unreal Engine's built-in AI.
Forget training data – Extend3D generates impressive town-scale 3D scenes from a single image by cleverly extending and patching the latent space of an object-centric 3D generative model.
LLM agents can be made more efficient and effective by mathematically grounding their reasoning in physics, leading to better performance in time-sensitive and resource-constrained environments.
Autonomous vehicles can drive more safely and reliably by grounding LLM reasoning in a "Commonsense World" that quantifies and leverages the trustworthiness of LLM outputs.
Achieve superhuman robot dexterity with 10x fewer demonstrations by decoupling intent and action through latent world modeling.
Safely study LLM-driven social behavior at scale, without the ethical minefield of deploying agents on live social networks.
End-to-end retrosynthetic planning, previously reliant on fragmented prediction-search hybrids, now achieves state-of-the-art performance thanks to a unified, reasoning-driven generative framework.
Quantum-inspired architectures can significantly improve 3D cloud forecasting by better capturing nonlocal dependencies, outperforming classical methods like ConvLSTM and Transformers.
Emulating human movement with 700 muscles reveals that many different control strategies can produce the same observed motion, challenging the assumption that kinematics uniquely define muscle activation.
Robots can now learn to reproduce oil paintings with impressive accuracy through self-play and learned dynamics, even without human demonstrations or high-fidelity simulators.
LLMs can boost their task-solving accuracy by nearly 50% simply by remembering and re-using past procedural plans, even across tasks with no lexical overlap.
Achieve fine-grained, six-degrees-of-freedom camera control in dynamic scenes with a generalizable model that outperforms scene-specific and diffusion-based approaches.
Quantifying and integrating map uncertainty—both positional and semantic—into trajectory prediction pipelines significantly boosts forecast accuracy, even when using existing baseline models.
Reconstructing dynamic 3D scenes from video just got a whole lot better: MotionScale achieves state-of-the-art fidelity and temporal stability by scaling Gaussian splatting to long, complex sequences.
Offline RL can now tackle complex, unseen temporal logic tasks without retraining, by stitching together learned short-horizon behaviors into long-horizon plans.
UUVs can navigate communication blackouts with 91% more accuracy by distilling patterns from their past trajectories.
By optimizing PID gains with MPPI, this method achieves comparable performance to conventional MPPI with significantly fewer samples, offering a more sample-efficient approach to learning-based control.
Achieve state-of-the-art robotic manipulation with a model orders of magnitude smaller than VLAs by explicitly aligning kinematic and semantic transitions.
VLN agents can now "dream ahead" by learning action-conditioned visual dynamics in a latent space, leading to SOTA results and improved real-world navigation.
World models can achieve state-of-the-art video prediction and emergent object decomposition by combining object-centric slots, hierarchical temporal dynamics, and learned causal interaction graphs.
Datacenter simulations can now combine multiple independent models to better predict performance and climate impact, addressing limitations of single-model approaches.
Finally, a video generation model lets you roam through a scene with long-term spatial and temporal consistency, opening up new possibilities for virtual exploration.
Achieve targeted motion adaptation in physics-based characters by learning a mask-invariant prior, enabling robust control even with missing observations or text-driven partial goals.
Video diffusion models lock in their high-level plan almost immediately, suggesting a new path to scaling their reasoning abilities by focusing compute on promising early trajectories.
The shift from traditional simulation to deep learning for network performance modeling brings new opportunities, but also requires careful consideration of evaluation methodologies to ensure fair comparison.
RL agents can learn to control complex fluid dynamics 40% faster by pretraining on Koopman-based surrogate models and iteratively refining them with policy-aware data.
Generating realistic, safety-critical maritime scenarios at scale is now possible by combining generative trajectory modeling with automated encounter pairing, moving beyond limited historical data or handcrafted templates.
Agentic RL agents can learn faster and perform better by dynamically maintaining a skill bank that combines high-level task guidance with low-level step-by-step decision support.
Latent planning for reasoning can actually *hurt* performance due to decoder distribution shift, highlighting a critical challenge in bridging neural and symbolic reasoning.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Current robot manipulation benchmarks fail to capture the messy reality of real-world deployment, so this work introduces a new benchmark, ManipArena, to close the sim2real gap.
Real-world 3D scene completion is now possible without synthetic training data, thanks to visibility-guided flow matching that handles incomplete scans.
Finally, you can precisely control specific objects in long, consistent driving videos, even those pesky long-tail objects.
Modeling raw gaze trajectories with a novel graph-based transformer beats saliency map and scanpath baselines at predicting human attention in driving, suggesting we've been throwing away valuable temporal information.
End-to-end autonomous driving gets a boost with a new framework that links perception, prediction, and planning in a unified chain of thought, outperforming fragmented approaches.
Forget painstakingly tuning MPC controllers by hand: this method learns optimal humanoid locomotion policies by aligning MPC cost functions with high-fidelity RL data.
Updating motion planning roadmaps in dynamic environments just got an order of magnitude faster with a GPU-accelerated edge validation scheme.
Bicycle robots can now do front-flips, thanks to a reinforcement learning method that bootstraps from dynamically infeasible reference motions.
Robots can now "see" hidden objects and understand articulation by learning from human egocentric video, even if they can't physically explore those areas themselves.
Finally, a single, open-source platform lets you train and test coordinated air and ground robots in photorealistic urban environments with synchronized physics and sensors.
Efficiency is the key bottleneck preventing video generation models from becoming general-purpose world simulators, and this paper provides a taxonomy of techniques to overcome it.
LLMs can boost the depth and structure of student reflection by explicitly scaffolding the planning and translation stages of writing, but the effect fades over time.
Freeing robots from pre-assigned tasks slashes completion times in multi-agent settings, with a new algorithm improving performance on almost 90% of tested scenarios.
Unlock real-time control of off-road vehicles on challenging terrain by representing complex terramechanics with linear Koopman operators learned from simulation data.
Unlabeled LiDAR data can now drive state-of-the-art traffic simulation, unlocking scalable realism without costly annotations.
Ditch the HD maps: OccSim generates multi-kilometer driving simulations from a single frame, unlocking 80x longer, more diverse training data.
Stop wandering aimlessly: DRIVE-Nav's directional reasoning and inspection slashes path lengths in open-vocabulary navigation, achieving a 5.6% SPL boost on HM3D-OVON.
Forget hand-tuning: PPO can learn to dynamically adjust Pure Pursuit's lookahead distance for autonomous racing, improving lap times and generalization to unseen tracks.
Real-time autonomous driving with language models is now possible, achieving 3x speedup and state-of-the-art performance by combining learned and rule-based planning.
Zero-shot visuotactile policies trained in a fast, parallelized simulator can directly control real robots in contact-rich tasks.