Search papers, labs, and topics across Lattice.
Beijing Institute of Technology
3
0
7
Forget monolithic policies – splitting your LLM's RL policy into accuracy-focused and exploration-driven modes unlocks better performance and diversity.
Open-source 7B LLMs can now rival GPT-4o performance on validation tasks, thanks to a novel reinforcement learning approach that leverages calibrated self-evaluation as a dense reward signal.
Stop wasting precious GPU memory: this new cache-semantic hash table library achieves up to 3.9 billion key-value lookups per second, outperforming standard approaches by up to 9.4x.