Search papers, labs, and topics across Lattice.
3
0
7
Reward hacking isn't just a bug, it's a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.
Offloading memory and computation to a copilot lets a 7B parameter GUI agent outperform larger models on long-horizon tasks, suggesting a path to more efficient and capable GUI automation.
LLMs can learn to anticipate their opponents' moves and make better decisions in strategic games by explicitly modeling the other player's behavior during training.