Search papers, labs, and topics across Lattice.
TJUNLP Lab, School of Computer Science and Technology, Tianjin University, China
2
0
4
12
RL's superior generalization isn't about brute force, but about carefully sculpting a few key features while preserving the base model's knowledge, unlike SFT's rapid specialization.
Forget brute-force hinting: KnowRL distills knowledge into atomic units, then uses subset selection to find the *least* amount of guidance needed to supercharge LLM reasoning.