Search papers, labs, and topics across Lattice.
3
1
6
6
Injecting demonstrations with a carefully annealed probability can drastically improve exploration in RLVR, even for tasks requiring novel reasoning or domain-specific knowledge.
Stop wasting compute: this RL-trained orchestration policy adaptively decides when your embodied agent should reason with an LLM, slashing latency and boosting task success compared to fixed strategies.
LLMs struggle to consistently use tools in dynamic environments, but a simple input reformulation strategy can boost performance by over 16% compared to standard methods like ReAct.