Search papers, labs, and topics across Lattice.
3
0
5
3
Forget hand-crafted KG traversal policies: GraphWalker uses automatically synthesized trajectories to train agents that achieve SOTA performance and generalize to unseen reasoning paths.
Reinforcement learning for multimodal agents doesn't have to collapse into uselessness: PyVision-RL shows how to stabilize training and encourage multi-turn tool use.
HarmoniDPO is proposed, a novel framework that integrates preference-based optimization into diffusion-based V2A generation and outperforms state-of-the-art methods in audio-video synchronization and subjective audio quality, offering a robust solution for generating realistic, human-preferred audio from video.