Search papers, labs, and topics across Lattice.
2
0
4
MiniMax-M2 proves that massive parameter counts don't always translate to better agentic performance; strategic activation of a smaller subset can unlock frontier-level intelligence.
Forget fixed residual connections: Attention Residuals let each layer selectively attend to previous layers, boosting performance and gradient flow in deep LLMs.