Search papers, labs, and topics across Lattice.
School of Electronics Engineering and Computer Science, Peking University
2
0
3
3
Unlocking the potential of compute-in-memory accelerators for LLMs requires carefully navigating a complex dataflow design space, and AccelCIM provides the first systematic framework to do so.
A hybrid-bonding-based LLM serving accelerator, Helios, tackles the dynamic nature of KV cache management in LLM serving, achieving significant speedup and energy efficiency gains over existing GPU/NMP designs.