Search papers, labs, and topics across Lattice.
CodecSplat introduces a learned compression scheme directly into the feed-forward 3D Gaussian Splatting pipeline by encoding the intermediate 2D Gaussian-generation feature into an entropy-coded bitstream. This allows the codec to exploit the structured intermediate feature representation, avoiding inefficient compression of irregular 3D Gaussian primitives. Experiments on DL3DV and RealEstate10K datasets demonstrate that CodecSplat achieves comparable PSNR with roughly one order of magnitude smaller scene representations compared to compressing feed-forward generated Gaussian primitives.
Compressing 3D Gaussian splats by operating on intermediate feature representations slashes storage by an order of magnitude without sacrificing rendering quality.
While feed-forward 3D Gaussian splatting reconstructs renderable Gaussian primitives from sparse context views without per-scene optimization, existing pipelines do not provide a compact scene representation for storage or transmission. A natural solution is to apply existing 3DGS compression methods to the generated Gaussian primitives. However, this approach operates on the final irregular 3D representation and is decoupled from the internal feature-to-Gaussian generation process, which limits compression efficiency. To address this, we introduce CodecSplat, an ultra-compact latent coding framework for feed-forward 3D Gaussian splatting. CodecSplat first encodes an intermediate 2D Gaussian-generation feature into an entropy-coded scene bitstream. At the decoder, the latent feature is reconstructed and used to predict depth and Gaussian parameters, which are then mapped to 3D Gaussian primitives. Note that, by integrating compression into the feed-forward Gaussian generation pipeline, CodecSplat avoids inefficient compression over irregular 3D Gaussian primitives and allows the codec to exploit the structured intermediate feature representation. We instantiate CodecSplat on a feed-forward Gaussian splatting backbone with depth-guided multi-view feature refinement and a hierarchical learned feature codec. On DL3DV and RealEstate10K datasets, CodecSplat achieves 23.56-26.36 dB and 24.76-27.05 dB PSNR with only 20.00-107.77 KiB and 3.37-12.51 KiB per scene, respectively. This is roughly one order of magnitude smaller than compressing feed-forward generated Gaussian primitives, while preserving controllable rate-distortion behavior.