Search papers, labs, and topics across Lattice.
3
0
6
1
LRMs can slash up to 40% of reasoning tokens without sacrificing accuracy by dynamically adjusting their "thinking speed" at each step.
Tool-integrated reasoning models often stubbornly stick to their own (wrong) answers, even when a tool provides the correct solution.
Forget static attention allocation – Flux Attention dynamically routes layers between full and sparse attention based on context, delivering significant speedups without sacrificing performance in long-context LLMs.