Search papers, labs, and topics across Lattice.
University of North Carolina Chapel Hill
2
0
4
21
On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.
LLMs can now infer plausible stage layouts from unstructured text alone, opening up new possibilities for automated media production.