Search papers, labs, and topics across Lattice.
3
0
5
Forget training experts sequentially – Co-Evolving Policy Distillation (CoPD) unlocks all-in-one integration of diverse reasoning capabilities by training experts in parallel with mutual teaching, outperforming even domain-specific experts.
Forget external teachers – the best way to boost your RL model's performance is to learn from its future self.
EasyVideoR1 achieves a 1.47 times throughput improvement in video understanding tasks by eliminating redundant video decoding and leveraging a comprehensive task-aware reward system.