Search papers, labs, and topics across Lattice.
SenseTime Research
2
0
4
Ditching modular architectures unlocks surprisingly competitive vision-language performance, proving that end-to-end pixel-to-word models can rival traditional approaches at scale.
InterSketch shows that interleaving visual sketches with textual reasoning, guided by self-correction and stepwise rewards, unlocks surprisingly strong long-horizon visual reasoning, even surpassing Gemini-3-Pro.