Search papers, labs, and topics across Lattice.
6
0
9
8
LLMs can match or beat human reviewers on specific aspects of peer review like novelty verification and critique prioritization, but they still exhibit critical blind spots that aggregate metrics miss.
VideoLLMs are surprisingly bad at keeping track of who did what, frequently mixing up actions across different video segments like a confused movie editor.
Current depression patient simulators are more like Pollyannas than patients, resolving negative emotions too quickly and following a predictable trajectory from negative to positive.
CLIP models suffer from a surprisingly strong "center bias," causing them to miss important objects outside the image's central region, even when those objects are crucial for accurate vision-language understanding.
LRMs can often correct themselves even after making mistakes in their reasoning, hinting at a powerful, untapped "hidden critique ability" that can be unlocked with targeted interventions in the latent space.
Direct Preference Optimization (DPO) can be rescued from performance collapse with a simple importance sampling fix, especially when regularization is weak.