Search papers, labs, and topics across Lattice.
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
3
0
7
10
Current autonomous agent benchmarks miss nearly half of safety violations and over 10% of robustness failures because they only check final outputs, a problem Claw-Eval directly addresses.
1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.
LLMs trained with a novel "second-order rollout" that generates critiques in addition to responses learn more effectively from the same data, unlocking better reasoning.