Search papers, labs, and topics across Lattice.
4
0
8
Forget toy problems: Gym-Anything lets you turn *any* software into an agent environment, unlocking a world of 10K+ real-world tasks spanning medicine, engineering, and more.
On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.
Training LLMs to reconstruct arguments boosts their critical thinking abilities across diverse tasks, suggesting a promising new direction for imbuing reasoning skills.
Forget manual curation鈥攁ligning policy gradients with a validation set adaptively selects RL training data, leading to more stable LLM training and improved performance.