Search papers, labs, and topics across Lattice.
4
0
8
13
Stop wasting tokens on irrelevant questions: reward models that ask about task relevance and user answerability can slash question count by 41% while matching GPT-5's issue resolution rate.
On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Stop guessing when humans want to take over: modeling user intervention styles in web agents boosts their usefulness by 26.5%.