Search papers, labs, and topics across Lattice.
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
2
0
6
3
Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.
Injecting demonstrations with a carefully annealed probability can drastically improve exploration in RLVR, even for tasks requiring novel reasoning or domain-specific knowledge.