Search papers, labs, and topics across Lattice.
2
0
4
Ditching modular architectures unlocks surprisingly competitive vision-language performance, proving that end-to-end pixel-to-word models can rival traditional approaches at scale.
By decoupling semantic reasoning from spatial grounding, AnchorSeg achieves unprecedented accuracy in reasoning segmentation tasks.