Search papers, labs, and topics across Lattice.
4
0
7
Forget turn-based interactions: MiniCPM-o 4.5 achieves real-time, full-duplex omni-modal interaction, letting it see, listen, speak, and even proactively comment on its environment, all at Gemini-level performance but a fraction of the size.
Multimodal models forget how to see and reason after SFT, but PRISM realigns them before RL, boosting performance by up to 6%.
OPD's "free lunch" of dense token-level reward may be an illusion, as teacher novelty, not just higher scores, drives successful distillation.
Forget full attention: a hybrid sparse-linear attention model, MiniCPM-SALA, achieves 3.5x faster inference and supports 1M context length on a single GPU, all while maintaining comparable performance.