Search papers, labs, and topics across Lattice.
Shanghai AI Laboratory
2
901
5
6
Current MLLMs still struggle to connect the dots between images and text when they're interleaved, highlighting a critical gap in real-world multimodal understanding.
Open-source multimodal models just leveled up: InternVL3 rivals closed-source titans like GPT-4o by pre-training vision and language together from the start.