Search papers, labs, and topics across Lattice.
2
0
4
Current image difference captioning benchmarks fail to capture semantic consistency and penalize hallucinations, but DiffCap-Bench offers a robust alternative that aligns with human expert judgments and predicts downstream utility for image editing.
Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.