Search papers, labs, and topics across Lattice.
Harbin Institute of Technology (Shenzhen)
2
0
6
LRMs can slash up to 40% of reasoning tokens without sacrificing accuracy by dynamically adjusting their "thinking speed" at each step.
LLM serving can achieve 5.6x higher throughput without sacrificing latency by decoupling preemption granularity from scheduling frequency.