Search papers, labs, and topics across Lattice.
8
0
13
No existing model can effectively ground the spatial structure of student reasoning in multi-page handwritten homework, revealing a significant gap in automated assessment capabilities.
PreciseDoc achieves unprecedented precision in grounding critical document elements, transforming how LMMs can interpret complex text-rich environments.
By harnessing implicit supervision from environment dynamics, EnvRL boosts RL success rates by over 4% on long-horizon tasks, revealing a new frontier in agentic learning.
Environment engineering, not just agent workflows, is the key to unlocking the full potential of autonomous scientific discovery, as demonstrated by EurekAgent's record-breaking results.
Reward hacking in rubric-based RL is not just common; it can be systematically reproduced and analyzed using the new CHERRL environment, revealing hidden biases that could compromise training integrity.
LLMs can be taught to reason more comprehensively over long contexts by rewarding not just the final answer, but also the quality of the reasoning steps taken to arrive at that answer.
Forget external signals – unlock better LLM post-training by mining model internals with sparse autoencoders to reveal data diversity, difficulty, and quality.
Current reward models are surprisingly bad at judging story quality, achieving only 66% accuracy in selecting human-preferred narratives – a gap closed by a new, purpose-built reward model.