Xiangyu Zhang

China Mobile Qilu Innovation Research Institute

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (4)Multimodal Models (4)Red-Teaming & Adversarial Robustness (3)Eval Frameworks & Benchmarks (3)

Frequent co-authors

Xuan Chen (3)Yingxiu Zhao (2)Kangheng Lin (2)Hongbo Peng (2)

Papers (9)

Apr 20, 2026

Complex Laboratory of New Finance and Economics5d ago·also China Mobile, Engineering Research Center of Intelligent Finance, Kash Institute of Electronics and Information, Ministry of Education +3

Spike-NVPT: Learning Robust Visual Prompts via Bio-Inspired Temporal Filtering and Discretization

Noise-robust visual prompts can improve model performance by over 11% without increasing inference costs.

Qiugang Zhan, Anning Jiang, Ran Tao +4

Computer Vision Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Apr 15, 2026

Dinging Li +201w ago·also China Mobile, ZJU

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Forget noisy pseudo-labels: SpatialEvo unlocks self-supervised 3D spatial reasoning by generating perfectly accurate training data directly from scene geometry.

Dinging Li, Ding Li, Yingxiu Zhao +18

Computer Vision Robotics & Embodied AI World Models & Planning

Apr 13, 2026

1w ago

Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization

Fusing video with audio tokenizers doesn't have to trash reconstruction quality: timing-aware fusion *before* quantization unlocks better audio understanding without sacrificing fidelity.

Xiangyu Zhang, Benjamin John Southwell, Siqi Pan +3

Inference & Quantization Multimodal Models Speech & Audio

Mar 18, 2026

Mar 18, 2026·also China Mobile, Nanyang Normal University

Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety

Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.

Xuan Chen, Lu Yan, Ruqi Zhang +1

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 17, 2026

Nanyang Normal UniversityMar 17, 2026·also CAS, China Mobile

When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents

Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.

Lu Yan, Xuan Chen, Xiangyu Zhang

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mar 11, 2026

Mar 11, 2026·also China Mobile

WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

Multimodal LLMs still struggle to faithfully recreate webpages from videos, particularly in capturing fine-grained style and motion, despite advances in other areas.

Yuhong Dai, Yanlin Lai, Mitt Huang +8

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 2, 2026

Mar 2, 2026·also China Mobile

MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interaction Potentials

Invariant models can match the accuracy of equivariant machine learning interatomic potentials at a fraction of the computational cost, thanks to a novel attention mechanism.

Yuanchang Zhou, Siyu Hu, Xiangyu Zhang +3

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

Feb 24, 2026

Feb 24, 2026·also CAS, Central South University, China Mobile

MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection

By dynamically orchestrating tools and recalling past reasoning, an LLM agent can boost phishing detection recall by 20% on real-world social media URLs.

Xuan Chen, Hao Liu, Yuan Tao +3

Multimodal Models Red-Teaming & Adversarial Robustness Tool Use & Agents

Feb 16, 2026

Feb 16, 2026·also AI2, CAS, Central South University, China Mobile +2

DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI

Forget fine-tuning: DM0 shows that pretraining a VLA model from scratch on diverse embodied and non-embodied data leads to SOTA performance in physical AI tasks.

Jianjian Sun, Kangheng Lin, Ruitao Zhang +31

Computer Vision Multimodal Models Robotics & Embodied AI

Search

Xiangyu Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)