Search papers, labs, and topics across Lattice.
Beijing Institute of Technology
3
0
5
LLM agents can get 18% better at tasks by co-evolving their skills and tools, instead of learning them separately.
Forget monolithic policies – splitting your LLM's RL policy into accuracy-focused and exploration-driven modes unlocks better performance and diversity.
Open-source 7B LLMs can now rival GPT-4o performance on validation tasks, thanks to a novel reinforcement learning approach that leverages calibrated self-evaluation as a dense reward signal.