Search papers, labs, and topics across Lattice.
7
0
10
5
By co-evolving experts through bidirectional policy distillation, CoPD achieves all-in-one integration of text, image, and video reasoning, outperforming domain-specific experts and suggesting a new training paradigm.
Forget external teachers – the best way to boost your RL model's performance is to learn from its future self.
EasyVideoR1 achieves a 1.47 times throughput improvement in video understanding tasks by eliminating redundant video decoding and leveraging a comprehensive task-aware reward system.
Forget brute-force hinting: KnowRL distills knowledge into atomic units, then uses subset selection to find the *least* amount of guidance needed to supercharge LLM reasoning.
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.
LLMs can fail to generalize knowledge edits to instruction-following scenarios due to a "Covariance Trap," but RoSE unlocks robust interactive parametric memory by aligning representations and smoothing the optimization landscape.
Forget scaling depth and width—MOUE unlocks a new "virtual width" dimension for Mixture-of-Experts by cleverly reusing a single expert pool across layers.