Search papers, labs, and topics across Lattice.
Independent researchers *Equally contributed authors
3
0
6
Code agents struggle with evolving user requirements, revealing a 38-point gap in performance across leading LLMs when faced with iterative feedback.
Long-context LLM rankings dramatically reshuffle when evaluated across a range of context lengths and capabilities, proving that a single headline score is misleading.
Interactive world models still have a long way to go: a comprehensive benchmark reveals that even state-of-the-art models struggle to consistently perform well across video quality, interaction adherence, and physics compliance.