Search papers, labs, and topics across Lattice.
Adobe Research
3
0
6
MLLMs that ace standard Referring Expression Comprehension benchmarks still stumble when faced with images designed to eliminate shortcuts, revealing a surprising lack of robust visual reasoning.
You can drastically improve text-to-image retrieval from short, ambiguous queries by using a language model to generate richer, quality-aware descriptions.
Forget handcrafted metrics: RetouchIQ uses an RL-tuned MLLM to generate its own reward signals for instruction-based image editing, leading to more semantically consistent and perceptually pleasing results.