Search papers, labs, and topics across Lattice.
2
0
4
4
Subtracting the mean from activations unlocks stable FP4 training for LLMs, closing the performance gap with BF16 without complex spectral methods.
Unified multimodal models often *hurt* performance on multimodal understanding tasks, except for spatial reasoning, visual illusions, and multi-round reasoning, challenging the assumption that generation universally improves understanding.