Search papers, labs, and topics across Lattice.
2
0
6
5
Forget static relevance labels – RRPO uses LLM feedback to train RAG rerankers, boosting generation quality without expensive human annotations.
Ditch the costly process supervision: EVOM trains LLMs to generate optimization code by treating solvers as deterministic reward functions, enabling zero-shot solver transfer and efficient adaptation.