Search papers, labs, and topics across Lattice.
2
0
5
7
Stop wasting compute: a learned policy can intelligently allocate LLM inference budgets, boosting accuracy by up to 12.8% compared to uniform allocation.
RL unlocks genuinely new tool-use capabilities in LLMs by enabling compositional strategies that surpass what's achievable through mere re-sampling, challenging the notion that RL only improves reliability.