×1 convolutional neural network to align with the feature dimension dd from image featuresD features is important for consistent cross-view reasoning. FinallyD Gaussian primitives. Combined with aApr 15, 2026arXiv:2604.13905

Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias

Zhiyuan Xu, Jiuming Liu, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng

AI Summary

SparseGen is introduced as a novel image-to-3D generation framework that uses a sparse set of learned 3D anchor queries expanded into 3D Gaussian primitives, trained via rectified flow reconstruction. This approach reduces memory and inference time by allocating representation capacity based on geometric and appearance importance, avoiding dense volumetric or pixel-aligned representations. The method achieves comparable multi-view fidelity with significantly reduced input-view bias, as measured by newly introduced quantitative metrics.

Key Contribution

Sparse queries offer a surprisingly effective and efficient alternative to dense representations for image-to-3D generation, achieving comparable fidelity with less input-view bias.

Abstract

We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias

Related Papers