Search papers, labs, and topics across Lattice.
DSA, HKUST(GZ)
2
0
3
Dynamic allocation of compute resources based on attention entropy can yield significant speedups in long-context LLM inference without sacrificing quality.
TimeBlocks outperforms traditional time-series models by dynamically constructing lightweight, adaptable models that excel in real-time processing.