Search papers, labs, and topics across Lattice.
Beihang University
1
0
3
Mismatched SFT data hurting your LLM's reasoning? DART uses RL to transform it into perfectly aligned training examples, boosting generalization and efficiency.