Mahdi Rad

Papers on Lattice

Total citations

Topics

h-index

Research focus

Computer Vision (3)Multimodal Models (3)Natural Language Processing (1)Reasoning & Chain-of-Thought (1)Training Efficiency & Optimization (1)

Frequent co-authors

Kevin Qu (2)Mihai Dusmanu (2)Haozhe Qi (1)Kevin Qu (1)

Papers (3)

Mar 30, 2026

Haozhe Qi +9Mar 30, 2026·also SJTU

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

MLLMs can now efficiently process 10K-frame videos without training, by adaptively selecting tokens based on the model's own uncertainty about the content.

Haozhe Qi, Kevin Qu, Kevin Qu +7

Computer Vision Multimodal Models Natural Language Processing

Mar 18, 2026

Kevin Qu +5Mar 18, 2026

Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Forget expensive 3D training data: Loc3R-VLM shows how to give 2D vision-language models strong 3D spatial reasoning by distilling knowledge from a pretrained 3D foundation model using only monocular video.

Kevin Qu, Haozhe Qi, Mihai Dusmanu +3

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Feb 13, 2026

Stanford HAIFeb 13, 2026

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Video Language Models can achieve up to 86% faster time-to-first-token and 93% token reduction by ditching full-image encoding in favor of motion vectors and residuals from video codecs.

Sayan Deb Sarkar, Rémi Pautrat, Ondrej Miksik +3

Computer Vision Multimodal Models Training Efficiency & Optimization

Search

Mahdi Rad

Research focus

Frequent co-authors

Papers (3)