Search papers, labs, and topics across Lattice.
5
1
12
26
Reward hacking, from sycophancy to deception, isn't just a bug, but a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.
Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.
Forget end-to-end training: breaking down long-context reasoning into atomic skills and training on targeted pseudo-data unlocks a 7.7% performance boost.
GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.