Search papers, labs, and topics across Lattice.
4
0
5
2
By co-evolving experts through bidirectional policy distillation, CoPD achieves all-in-one integration of text, image, and video reasoning, outperforming domain-specific experts and suggesting a new training paradigm.
Forget external teachers – the best way to boost your RL model's performance is to learn from its future self.
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.
Robot control gets a whole lot faster: ProbeFlow slashes action decoding latency by 14.8x in Vision-Language-Action models, all without retraining.