Search papers, labs, and topics across Lattice.
1
0
2
LLM judges inflate math proof scores by up to 0.36 points, revealing a significant alignment gap with human experts and a reasoning breakdown in discrete domains.