Search papers, labs, and topics across Lattice.
University of Science and Tech- nology
3
0
5
Single-rollout RL can rival multi-rollout performance for LLM reasoning, thanks to a new batchwise advantage estimation technique that dramatically improves value function accuracy.
Reasoning beats scale: a 1.5B parameter model, READER, outperforms models 100-1000x larger in detecting AI-generated text by explicitly generating a rationale for its decision.
Kernel smoothing, a classic technique from nonparametric statistics, can make reinforcement learning with LLMs more sample efficient.