Search papers, labs, and topics across Lattice.
1
0
2
6
Ditch BLEU and ROUGE: ViSIL offers a unified metric for multimodal video captioning that actually correlates with VQA performance and human judgment by measuring information loss via VLM inference.