Search papers, labs, and topics across Lattice.
3
0
5
MoEs can be pruned more effectively by considering cross-layer redundancy, leading to significant performance gains compared to uniform pruning strategies.
AgentOS reimagines LLMs as reasoning kernels within a structured OS, offering a blueprint for more robust and scalable AI agents.
Forget full-cache rollouts: this parameter-efficient fine-tuning method lets large reasoning models maintain accuracy while slashing memory usage during RL training.