Search papers, labs, and topics across Lattice.
1
0
3
Freezing most of your critic network and only training a tiny LoRA adapter can dramatically improve off-policy RL performance and stability.