Search papers, labs, and topics across Lattice.
1
0
3
0
On-device LLM performance is heavily influenced by sequence length and model depth, with hardware heterogeneity creating efficiency traps that can be mitigated by architectural refinements like Multi-head Latent Attention.