Search papers, labs, and topics across Lattice.
This paper introduces a precision-adaptive optimization framework to enable Variational Bayesian Gaussian Splatting (VBGS) training on resource-constrained edge devices. The framework profiles VBGS to identify bottlenecks, fuses memory-dominant kernels, and employs mixed-precision search to automatically assign operation-level precisions. Results show significant reductions in memory usage and training time on both A5000 GPUs and Jetson Orin Nano, while maintaining or improving reconstruction quality.
You can now train Gaussian Splatting models on your edge device, thanks to a clever optimization that slashes memory use by 8x and speeds up training by 4x, all without sacrificing reconstruction quality.
Novel view synthesis (NVS) is increasingly relevant for edge robotics, where compact and incrementally updatable 3D scene models are needed for SLAM, navigation, and inspection under tight memory and latency budgets. Variational Bayesian Gaussian Splatting (VBGS) enables replay-free continual updates for the 3DGS algorithm by maintaining a probabilistic scene model, but its high-precision computations and large intermediate tensors make on-device training impractical. We present a precision-adaptive optimization framework that enables VBGS training on resource-constrained hardware without altering its variational formulation. We (i) profile VBGS to identify memory/latency hotspots, (ii) fuse memory-dominant kernels to reduce materialized intermediate tensors, and (iii) automatically assign operation-level precisions via a mixed-precision search with bounded relative error. Across the Blender, Habitat, and Replica datasets, our optimised pipeline reduces peak memory from 9.44 GB to 1.11 GB and training time from ~234 min to ~61 min on an A5000 GPU, while preserving (and in some cases improving) reconstruction quality of the state-of-the-art VBGS baseline. We also enable for the first time NVS training on a commercial embedded platform, the Jetson Orin Nano, reducing per-frame latency by 19x compared to 3DGS.