Yang Shi

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)Speech & Audio (2)Computer Vision (2)Natural Language Processing (2)

Frequent co-authors

Bozhou Li (2)Bohan Zeng (2)Yue Ding (2)Yuchen Huang (1)

Papers (5)

Jun 11, 2026

B) and hour-level builds for ultra-large-scale1w ago·also Shanghai Jiaotong University

The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman

HELMSMAN slashes hardware costs by over 90% while enabling billion-scale index rebuilds in mere hours, revolutionizing ANNS for large-scale applications.

Yuchen Huang, Baiteng Ma, Yiping Sun +7

Distributed Systems & Hardware Recommendation & Information Retrieval

May 25, 2026

3w ago·also Tsinghua AI, HKUST, Kuaishou, NJU +3

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Current audio-visual generation models struggle to maintain coherence and alignment when scaling to minute-long content, a problem exposed by the new LongAV-Compass benchmark.

Tengfei Liu, Yang Shi, Xuanyu Zhu +14

Eval Frameworks & Benchmarks Multimodal Models Speech & Audio

May 21, 2026

May 21, 2026·also Tsinghua AI, CAS, Guilin University of Electronic, HKUST +7

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Ditching text-based chain-of-thought unlocks better audio-visual reasoning by interleaving textual steps with a unified latent space that preserves dense sensory information.

Yifan Dai, Zhenhua Wu, Bohan Zeng +16

Multimodal Models Reasoning & Chain-of-Thought Speech & Audio

Apr 27, 2026

Guangdong University of TechnologyApr 27, 2026·also SYSU

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift

Test-time adaptation of vision-language models can actually *hurt* performance when modalities shift asymmetrically; MG-MTTA fixes this by explicitly modeling modality reliability.

Lixian Chen, Mingxuan Huang, Yan-Hong Chen +2

Computer Vision Multimodal Models Natural Language Processing

Apr 21, 2026

Guangdong University of TechnologyApr 21, 2026·also PKU

TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing

Forget view consistency tricks – language-driven 3D editing leaps forward by explicitly modeling semantic relationships between 2D edits and 3D Gaussians.

Yanhui Chen, Jingchao Wang, Zixin Zeng +1