Search papers, labs, and topics across Lattice.
The paper introduces ZipSplat, a novel token-based feed-forward model that optimizes 3D Gaussian Splatting by decoupling Gaussian placement from the pixel grid, allowing for a more efficient representation of complex scenes. By employing a multi-view backbone to extract dense visual tokens and utilizing k-means clustering for token compression, ZipSplat achieves significant improvements in both quality and efficiency, operating with approximately six times fewer Gaussians than traditional pixel-aligned methods. This approach not only sets new state-of-the-art results on benchmark datasets DL3DV and RealEstate10K but also demonstrates zero-shot generalization to other datasets like Mip-NeRF360 and ScanNet++, outperforming existing baselines.
Achieving six times fewer Gaussians while surpassing state-of-the-art performance redefines efficiency in 3D scene reconstruction.
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.