Hesong Wang

Zhejiang University, Westlake University

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (2)Computer Vision (1)Inference & Quantization (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Xin Jin (1)Lu Lu (1)Chenhao Li (1)Jian Chen (1)

Papers (2)

May 28, 2026

2w ago·also Westlake

EarlyTom: Early Token Compression Completes Fast Video Understanding

Video-LLMs can achieve up to 2.65x faster time-to-first-token and 61% FLOPs reduction by compressing visual tokens *inside* the vision encoder, not just after.

Hesong Wang, Xin Jin, Lu Lu +4

Computer Vision Inference & Quantization Multimodal Models

Mar 19, 2026

Keda Tao +21Mar 19, 2026·also Westlake, ZJU

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

Current OmniLLMs stumble when processing real-world, long-form audio-visual content, achieving only ~35-65% accuracy on a new benchmark designed to test long-term memory and fine-grained understanding.

Keda Tao, Keda Tao, Yuhua Zheng +19

Eval Frameworks & Benchmarks Multimodal Models Speech & Audio

Search

Hesong Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)