Search papers, labs, and topics across Lattice.
3
0
6
2
Harnessing the internal states of LLMs, SIREN outperforms traditional guard models while using a fraction of the parameters, revolutionizing harmful content detection.
Rollout design in LLM reinforcement learning is more than just sampling trajectories – it's a modular pipeline you can optimize for reliability, coverage, and cost.
Jointly training LLMs to reason and refine their answers unlocks significant performance gains, outperforming standard policy optimization by up to 11.5 points on AIME.