Search papers, labs, and topics across Lattice.
3
0
4
High artifact detection rates in VLMs mask significant failures in contextual understanding, with top models misidentifying visual cues in over 46% of cases.
ELBO-based reinforcement learning, previously dismissed for visual generation, can actually outperform MDP-based methods for aligning denoising generative models with human preferences.
MLLMs are surprisingly robust to catastrophic forgetting during fine-tuning, needing only simple regularization or data-hybrid training to maintain performance.