HUJIApr 16, 2026arXiv:2604.15284

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

R. Itkin, Roni Itkin, Noam Issachar, Yehonatan Keypur, Anpei Chen, Sagie Benaim

AI Summary

GlobalSplat introduces a feed-forward 3D Gaussian Splatting method that learns a compact, global latent scene representation to resolve cross-view correspondences before decoding explicit 3D geometry. This approach avoids redundancy inherent in pixel-aligned or voxel-aligned methods by aligning first and decoding later, leading to compact and globally consistent reconstructions. Experiments on RealEstate10K and ACID show competitive novel-view synthesis performance with significantly fewer Gaussians (16K) and faster inference (78ms) compared to dense pipelines.

Key Contribution

Achieve competitive novel-view synthesis with 90% fewer Gaussians and 10x faster inference by learning a compact global scene representation before decoding 3D geometry.

Abstract

The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or voxel-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile. To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 16K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass. Project page is available at https://r-itk.github.io/globalsplat/

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

Related Papers