Search papers, labs, and topics across Lattice.
3
0
7
Turns out, the terminal feedback your CLI agent throws away is actually a goldmine of dense supervision, allowing for significant performance gains and even self-improvement.
A 4B parameter SLM can now rival frontier agent performance in complex tool-use environments, thanks to a novel reinforcement finetuning framework that teaches it how to strategically acquire context and execute actions.
Agentic LLMs can be taught to refuse harmful actions with up to 50% greater success, even zero-shot across diverse models and tasks, by explicitly learning when *not* to act.