Search papers, labs, and topics across Lattice.
3
0
5
5
Current image editing models stumble when domain-specific knowledge is required, as revealed by a new benchmark spanning disciplines from natural science to social science.
A 4B-parameter model, InternVL-U, outperforms 14B-parameter models in multimodal generation and editing, proving that size isn't everything.
Sports expose surprising limitations in VLMs' spatial reasoning, as current models struggle to generalize from existing benchmarks despite fine-tuning gains on a new, large-scale dataset.