Search papers, labs, and topics across Lattice.
1
0
3
State Space Models can outperform Vision Transformers as vision encoders in VLMs, particularly when model size is a constraint.