Search papers, labs, and topics across Lattice.
1
0
3
Shrinking LLM reasoning for mobile devices is now possible: LoRA adapters, RL-based budget forcing, and KV-cache tricks let Qwen2.5-7B reason efficiently on-device.