Search papers, labs, and topics across Lattice.
2
0
4
Forget toy datasets: OpenSWE delivers 45K+ real-world, executable Python environments for leveling up your SWE agent, and it's all open-sourced.
LLM judges inflate math proof scores by up to 0.36 points, revealing a significant alignment gap with human experts and a reasoning breakdown in discrete domains.