Search papers, labs, and topics across Lattice.
4
0
8
3
Heterogeneous agents can boost each other's performance in RL without coordinated deployment, achieving better results with less data than traditional methods.
Achieve precise, coherent, and mask-free 3D editing from text prompts by having a multimodal LLM decompose the prompt into structural and appearance-level guidance for a rectified-flow inpainting pipeline.
LRMs already know when to stop reasoning, but current sampling methods are holding them back.
Stop overfitting your reward model: R2M leverages real-time policy feedback to dynamically align the reward model with the evolving policy distribution, reducing reward overoptimization in RLHF.