Search papers, labs, and topics across Lattice.
OccSim, a novel 3D simulator, is introduced that leverages an occupancy world model to generate long-horizon driving scenarios without relying on pre-recorded logs or HD maps. It achieves this through a W-DiT based static occupancy world model and a Layout Generator for dynamic agent population, enabling stable generation of over 3,000 frames and construction of 3D occupancy maps spanning over 4 kilometers. Data generated from OccSim significantly improves zero-shot performance of 4D semantic occupancy forecasting models, outperforming asset-based simulators by a large margin.
Ditch the HD maps: OccSim generates multi-kilometer driving simulations from a single frame, unlocking 80x longer, more diverse training data.
Data-driven autonomous driving simulation has long been constrained by its heavy reliance on pre-recorded driving logs or spatial priors, such as HD maps. This fundamental dependency severely limits scalability, restricting open-ended generation capabilities to the finite scale of existing collected datasets. To break this bottleneck, we present OccSim, the first occupancy world model-driven 3D simulator. OccSim obviates the requirement for continuous logs or HD maps; conditioned only on a single initial frame and a sequence of future ego-actions, it can stably generate over 3,000 continuous frames, enabling the continuous construction of large-scale 3D occupancy maps spanning over 4 kilometers for simulation. This represents an>80x improvement in stable generation length over previous state-of-the-art occupancy world models. OccSim is powered by two modules: W-DiT based static occupancy world model and the Layout Generator. W-DiT handles the ultra-long-horizon generation of static environments by explicitly introducing known rigid transformations in architecture design, while the Layout Generator populates the dynamic foreground with reactive agents based on the synthesized road topology. With these designs, OccSim can synthesize massive, diverse simulation streams. Extensive experiments demonstrate its downstream utility: data collected directly from OccSim can pre-train 4D semantic occupancy forecasting models to achieve up to 67% zero-shot performance on unseen data, outperforming previous asset-based simulator by 11%. When scaling the OccSim dataset to 5x the size, the zero-shot performance increases to about 74%, while the improvement over asset-based simulators expands to 22.1%.