Search papers, labs, and topics across Lattice.
1
0
2
Get RL-level multi-turn LLM performance with SFT-level efficiency by decoupling trajectory generation and optimization via importance weighting.