Search papers, labs, and topics across Lattice.
Zhejiang University, The Chinese University of Hong Kong ♦, Eastern Institute of Technology
3
0
6
ISPO reduces critical reasoning failures in RLVR by transforming reward structures, leading to superior performance on complex reasoning tasks.
OPRD closes the performance gap between student and teacher models while training 1.44x faster and using 54% less memory than traditional methods.
SkillComposer enables language models to self-evolve skills in real-time, achieving up to +4.5 improvements on agent tasks compared to larger models.