Search papers, labs, and topics across Lattice.
Nanyang Technological University
7
0
6
PermaVid achieves unprecedented long-term consistency in video generation, even after significant edits, by disentangling appearance and geometry in its memory architecture.
Prisma-World achieves unprecedented cross-view consistency in multi-agent video generation by leveraging a joint geometry-aware denoising process.
U4D reveals that leveraging spatial uncertainty can drastically enhance the quality of LiDAR scene synthesis, achieving unprecedented fidelity and coherence.
Ditching modular architectures unlocks surprisingly competitive vision-language performance, proving that end-to-end pixel-to-word models can rival traditional approaches at scale.
Spatial foundation models aren't as "all-round" as we thought: SpatialBench reveals surprising generalization gaps and the critical importance of domain alignment over naive data scaling.
LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.
A single framework now generates simulation-ready 3D assets for rigid, deformable, and articulated objects, unlocking new possibilities for embodied AI and physics-based simulation.