Search papers, labs, and topics across Lattice.
6
0
11
1
Forget coarse sequence-level hacks: LenVM lets you precisely dial in token generation length, boosting a 7B model's length accuracy from 30.9 to 64.8 and crushing closed-source rivals.
Realistic user simulation is now possible: Pare offers a framework that moves beyond flat tool-calling APIs to model stateful user interactions, enabling better evaluation of proactive agents.
Injecting demonstrations with a carefully annealed probability can drastically improve exploration in RLVR, even for tasks requiring novel reasoning or domain-specific knowledge.
By verifying its reasoning steps both locally and globally, MiroThinker-H1 achieves state-of-the-art performance in complex research tasks, demonstrating the power of integrated verification for reliable multi-step problem solving.
Despite advances in multimodal models, they still struggle to understand spatial relationships from an egocentric perspective, as shown by a 37.66% performance gap on the new SAW-Bench benchmark.
Forget hand-crafted reward functions: CM2 uses checklists to train tool-using agents, outperforming SFT baselines by up to 12 points on key benchmarks.