Mar 3, 2026arXiv:2603.03447

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Weicai Yan, Yuhong Dai, Qimu Ran, Qi Ran, Haodong Li, Wang Lin, Haodong Liao, Hao Liao, Xing Xie, Tao Jin, Jianxun Lian

AI Summary

Proact-VL is introduced as a framework to create proactive, real-time interactive AI companions using multimodal language models, addressing the challenges of low-latency inference, autonomous response timing, and controlled content generation. The framework is evaluated on a newly introduced Live Gaming Benchmark dataset featuring solo commentary, co-commentary, and user guidance scenarios. Results demonstrate that Proact-VL achieves superior response latency and quality while maintaining strong video understanding, making it suitable for real-time interactive applications.

Key Contribution

Real-time AI companions can now proactively interact with users thanks to Proact-VL, a framework that balances response latency, content quality, and video understanding.

Abstract

Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated content to meet real-time constraints. In this work, we instantiate AI companions through two gaming scenarios, commentator and guide, selected for their suitability for automatic evaluation. We introduce the Live Gaming Benchmark, a large-scale dataset with three representative scenarios: solo commentary, co-commentary, and user guidance, and present Proact-VL, a general framework that shapes multimodal language models into proactive, real-time interactive agents capable of human-like environment perception and interaction. Extensive experiments show Proact-VL achieves superior response latency and quality while maintaining strong video understanding capabilities, demonstrating its practicality for real-time interactive applications.

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Related Papers