Search papers, labs, and topics across Lattice.
Facebook AI, MILA, Montreal
1
0
3
13
Forget exotic attention mechanisms – MobileLLM-Flash achieves up to 1.8x faster LLM prefill on mobile CPUs by smartly pruning and adapting existing architectures for on-device use.