Search papers, labs, and topics across Lattice.
SparseGen is introduced as a novel image-to-3D generation framework that uses a sparse set of learned 3D anchor queries expanded into 3D Gaussian primitives, trained via rectified flow reconstruction. This approach reduces memory and inference time by allocating representation capacity based on geometric and appearance importance, avoiding dense volumetric or pixel-aligned representations. The method achieves comparable multi-view fidelity with significantly reduced input-view bias, as measured by newly introduced quantitative metrics.
Sparse queries offer a surprisingly effective and efficient alternative to dense representations for image-to-3D generation, achieving comparable fidelity with less input-view bias.
We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.