Search papers, labs, and topics across Lattice.
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
2
0
5
3
Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.
On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.