Search papers, labs, and topics across Lattice.
2
0
4
0
LLM-as-a-judge can be made far more reliable by explicitly modeling the aggregation weights of sub-features in a tree structure, achieving near-human agreement on complex writing tasks.
Current judge models for instruction-following are surprisingly unreliable, but a new benchmark exposes their flaws and offers a path to better alignment.