Search papers, labs, and topics across Lattice.
6
0
10
4
Forget external teachers – the best way to boost your RL model's performance is to learn from its future self.
EasyVideoR1 achieves a 1.47 times throughput improvement in video understanding tasks by eliminating redundant video decoding and leveraging a comprehensive task-aware reward system.
Forget brute-force hinting: KnowRL distills knowledge into atomic units, then uses subset selection to find the *least* amount of guidance needed to supercharge LLM reasoning.
Self-distillation in LLMs can leak information and destabilize training, but combining it with verifiable rewards yields a sweet spot for improved convergence and stability.
LLMs can fail to generalize knowledge edits to instruction-following scenarios due to a "Covariance Trap," but RoSE unlocks robust interactive parametric memory by aligning representations and smoothing the optimization landscape.
Forget scaling depth and width—MOUE unlocks a new "virtual width" dimension for Mixture-of-Experts by cleverly reusing a single expert pool across layers.