Search papers, labs, and topics across Lattice.
3
0
6
0
Long-context LLM rankings dramatically reshuffle when evaluated across a range of context lengths and capabilities, proving that a single headline score is misleading.
Interactive world models still have a long way to go: a comprehensive benchmark reveals that even state-of-the-art models struggle to consistently perform well across video quality, interaction adherence, and physics compliance.
LLMs that ace math and physics still struggle with general reasoning, achieving only 63% accuracy on a new K-12 level benchmark.