Search papers, labs, and topics across Lattice.
2
53
5
12
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
RewardBench 2 exposes a stark reality check for reward models: they struggle significantly on new, human-generated prompts, yet this difficulty is surprisingly predictive of their actual usefulness in downstream tasks.