Search papers, labs, and topics across Lattice.
1
0
3
4
PRIMT tackles the data inefficiency of preference-based RL by using foundation models to generate synthetic multimodal feedback and synthesize trajectories, significantly outperforming existing FM-based and scripted baselines.