Search papers, labs, and topics across Lattice.
3
0
5
1
Forget external teachers – the best way to boost your RL policy might be learning from its future self.
EasyVideoR1 achieves a 1.47 times throughput improvement in video understanding tasks by eliminating redundant video decoding and leveraging a comprehensive task-aware reward system.
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.