Search papers, labs, and topics across Lattice.
2
0
5
Multimodal agents can now reason, plan, and execute actions more effectively by integrating perception as a core component, not just an auxiliary interface.
A 1.7B-parameter model can rival larger LMMs in detailed image captioning, proving that high-quality supervision and efficient architecture are more crucial than brute-force scaling.