Search papers, labs, and topics across Lattice.
Fudan University, Shanghai Innovation Institute
5
0
7
By leveraging the complementary strengths of shallow and deep VFM features, Ideal dramatically enhances image reconstruction quality and sets new benchmarks in autoregressive image generation.
One-step action generation in VLA models can outperform ten-step methods by simply biasing training towards high-noise states, challenging the need for complex iterative processes.
BiDPO achieves a remarkable boost in compositional fidelity for text-to-image generation, outperforming previous methods through innovative preference optimization techniques.
Forget patch-based image tokenization: channel-wise quantization unlocks better codebook utilization and text-to-image generation by representing images as discrete levels of visual detail.
Freezing your vision foundation model doesn't have to mean sacrificing fine-grained detail: DecQ unlocks improved reconstruction and faster generative convergence with just 8 extra queries and minimal overhead.