Search papers, labs, and topics across Lattice.
1
0
3
TRQAM stabilizes off-policy reinforcement learning by precisely controlling deviations from pretrained policies, leading to a 68% success rate—22% higher than the best prior method.