Search papers, labs, and topics across Lattice.
Ningbo Institute of Digital Twin, Eastern Institute of Technology, The Hong Kong Polytechnic University
2
0
4
Static depth pruning emerges as the most effective strategy for LLM acceleration, achieving near-theoretical speedup limits in memory-bounded contexts.
Low-KL agreement can trap models in ineffective training regimes, but KAT offers a dynamic solution that boosts accuracy while slashing rollout lengths.