Search papers, labs, and topics across Lattice.
2
0
5
Seedance 2.0 leapfrogs existing models by unifying multi-modal inputs (text, image, audio, video) into a single architecture for generating high-quality, longer-duration audio-video content.
A surprisingly simple VLA model, StarVLA-$\alpha$, beats more complex systems on real-world robotics tasks, suggesting that VLM backbones are more critical than intricate architectures.