Search papers, labs, and topics across Lattice.
Ningbo Institute of Digital Twin, Eastern Institute of Technology
2
0
5
Static depth pruning emerges as the most effective strategy for LLM acceleration, achieving near-theoretical speedup limits in memory-bounded contexts.
Untangling the mess of "streaming LLMs," this paper delivers a clear taxonomy that distinguishes between streaming generation, streaming inputs, and interactive architectures.