Search papers, labs, and topics across Lattice.
3
2
7
2
On-policy RL for machine learning engineering agents is now practical, thanks to a synthetic sandbox that slashes execution time by 13x while boosting performance by up to 67%.
LLMs still struggle to accurately infer user interests from interaction histories, especially when dealing with diverse engagement signals – a critical gap for effective personalization.
Forget imbalanced LoRA usage: ReMix leverages reinforcement learning to route effectively among LoRAs, boosting performance in parameter-efficient fine-tuning.