Search papers, labs, and topics across Lattice.
3
0
5
2
Unlock 2x faster reinforcement learning by distilling group feedback into actionable language refinements that guide exploration.
Unlock SOTA performance in long-horizon search tasks with REDSearcher, a framework that slashes the cost of training by strategically synthesizing complex tasks and boosting core LLM capabilities *before* reinforcement learning.
VESPO stabilizes off-policy RL training for LLMs by directly reshaping sequence-level importance weights, tolerating 64x policy staleness and asynchronous execution without collapse.