Search papers, labs, and topics across Lattice.
Department of Foundation Model
3
0
8
Ditch hard clipping: ratio-variance regularization offers a principled, "soft brake" approach to trust region policy optimization, unlocking substantial gains in sample efficiency and performance, especially for smaller LLMs.
Forget specialized architectures: StepAudio 2.5 proves a single audio-language foundation, shaped by RLHF, can dominate ASR, TTS, and real-time dialogue simultaneously.
Tool-integrated reasoning models often stubbornly stick to their own (wrong) answers, even when a tool provides the correct solution.