Search papers, labs, and topics across Lattice.
5
0
9
2
LLMs can slash the cost of reward function design in RL while simultaneously boosting performance, thanks to a novel framework that reuses and optimizes reward components.
Forget auxiliary encoders and handcrafted losses: LVRPO uses reinforcement learning to directly align language and vision, boosting performance across a range of multimodal tasks.
Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.
Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.
Stop treating generated images like real ones: GMAIL aligns them as separate modalities in a shared latent space, unlocking significant gains in vision-language tasks.