Search papers, labs, and topics across Lattice.
2
0
4
GRPO's Achilles' heel in deep search is its coarse advantage assignment, but CalibAdv offers a way to surgically correct it, boosting both performance and training stability.
Parameter importance isn't forever: dynamically adapting which parameters are frozen during fine-tuning significantly improves generalization and reduces forgetting in LLMs.