May 11 – May 18, 2026

Computer Vision - Weekly Roundup

2 papers published across 1 lab.

11400% acceleration

Selected Labs publishing this week

DAMO1

Top Papers

May 18, 2026

1w ago·also Tencent AI, UT Austin

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Current video understanding models struggle with long-horizon robustness and non-speech audio, as revealed by the new OmniPro benchmark designed for comprehensive omni-modal proactive evaluation.

Ruixiang Zhao, Jie Yang, Zijie Xin +4

Computer Vision Eval Frameworks & Benchmarks Multimodal Models+1

DAMO1w ago

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Multimodal LLMs struggle to pinpoint objects from nouns alone, but SWIM training realigns vision and language to outperform visual-prompt methods.

Computer Vision Multimodal Models Natural Language Processing

Search

Computer Vision - Weekly Roundup

Selected Labs publishing this week

Top Papers