Search papers, labs, and topics across Lattice.
3
0
5
5
Current image editing models struggle with discipline-specific knowledge, failing to consistently produce edits that are both visually consistent and logically sound within academic domains.
A 4B-parameter model, InternVL-U, outperforms 14B-parameter models in multimodal generation and editing, proving that size isn't everything.
Sports expose surprising limitations in VLMs' spatial reasoning, as current models struggle to generalize from existing benchmarks despite fine-tuning gains on a new, large-scale dataset.