Search papers, labs, and topics across Lattice.
DLUT Website, github.com, EvolvingLMMs-Lab/NEO
3
0
6
Ditching modular architectures unlocks surprisingly competitive vision-language performance, proving that end-to-end pixel-to-word models can rival traditional approaches at scale.
MLLMs get personality right half the time for the wrong reasons, revealing a massive "Prejudice Gap" where models fail to ground their judgments in observable behavior.
MLLMs still struggle to reason about everyday situations when they require identifying and using visual clues, despite excelling at tasks relying on pre-existing knowledge.