Xiangyu Bai

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (3)Multimodal Models (3)Training Efficiency & Optimization (1)Reasoning & Chain-of-Thought (1)

Frequent co-authors

Bishoy Galoaa (2)Bishoy M. Galoaa (2)Sarah Ostadabbas (2)Shayda Moezzi (1)

Papers (3)

Mar 19, 2026

Xiangyu Bai +31w ago

HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models

Forget brute-force scaling: intelligently selecting just 1% of video frames can actually *improve* video QA accuracy and cut compute by 93%.

Xiangyu Bai, Bishoy Galoaa, Bishoy M. Galoaa +1

Computer Vision Multimodal Models Training Efficiency & Optimization

Bishoy M. Galoaa +41w ago

Motion-o: Trajectory-Grounded Video Reasoning

Visual language models can now explicitly reason about object trajectories in videos, thanks to a simple yet effective method that augments training data and uses discrete motion tags.

Bishoy M. Galoaa, Bishoy Galoaa, Shayda Moezzi +2

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Xian Wu +121w ago

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

MLLMs can gain surprisingly strong 3D spatial reasoning abilities simply by tapping into the latent knowledge already present in video generation models.

Xian Wu, Xianjin Wu, Dingkang Liang +10

Computer Vision Multimodal Models World Models & Planning

Search

Xiangyu Bai

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)