Search papers, labs, and topics across Lattice.
1
0
2
Ditch SwiGLU's quadratic instability: PowLU offers a rational power function that stabilizes LLM pre-training without sacrificing performance.