Wanlong Fang

Papers on Lattice

Total citations

127

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (6)Computer Vision (5)Natural Language Processing (3)Recommendation & Information Retrieval (2)

Frequent co-authors

Xiang Fang (4)Changshuo Wang (2)Changshuo Wang (2)Xiaoye Qu (2)

Papers (7)

May 28, 2026

Xiang Fang +23w ago

CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning

MMRAG gets a human-like reasoning upgrade: CogniVerse uses cognitive reflection and information geometry to filter noise, align modalities, and generate coherent responses, outperforming existing systems.

Xiang Fang, Wanlong Fang, Changshuo Wang8

Multimodal Models Reasoning & Chain-of-Thought Recommendation & Information Retrieval

May 26, 2026

Wanlong Fang +1May 26, 2026

Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization

LVLMs are surprisingly susceptible to universal, black-box adversarial attacks that synergistically combine imperceptible image perturbations with subtle text prompts.

Wanlong Fang, Changshuo Wang

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Mar 14, 2026

Mar 14, 2026·also NTU

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

LLM defenses don't have to sacrifice performance: APD disentangles adversarial prompts to slash harmful outputs by 85% while maintaining model utility.

Xiang Fang, Wanlong Fang15

Natural Language Processing Red-Teaming & Adversarial Robustness

Xiang Fang +4Mar 14, 2026·also HUST, UCL, WHU

Rethinking Video-Language Model from the Language Input Perspective

VLMs can be significantly improved by reasoning over diverse, generated text inputs, rather than relying on restrictive, predefined templates.

Xiang Fang, Wanlong Fang, Changshuo Wang +210

Computer Vision Multimodal Models Natural Language Processing

Mar 14, 2026·also Guangzhou University, NJU, NTU, UCL +1

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs

VLMs can now handle real-world sensor failures and data privacy constraints without catastrophic performance drops, thanks to a new plug-and-play module for incomplete multi-modal inputs.

Xiang Fang, Wanlong Fang, Changshuo Wang +411

Computer Vision Multimodal Models

Oct 28, 2024

Oct 28, 2024·also CUHK, DUT, PKU, Shanghai Jiaotong University +2

Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language

Current video moment retrieval systems fail catastrophically when given irrelevant queries, but this work introduces a method to detect and reject such queries, preventing potentially dangerous false retrievals.

Xiang Fang, Wanlong Fang, Daizong Liu +839

Computer Vision Multimodal Models Recommendation & Information Retrieval

Mar 24, 2024

Mar 24, 2024·also DUT, PKU, SCU, Shenzhen Univeristy

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language

Stop wasting compute on irrelevant video clips: SpotVMR trims videos to only the query-relevant moments, boosting retrieval performance while slashing computational cost.

Xiang Fang, Daizong Liu, Wanlong Fang +544