Shuai Wang

Nanjing University

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (3)Speech & Audio (3)Computer Vision (2)Tool Use & Agents (2)

Frequent co-authors

Liang Li (1)Yang Chen (1)Ruopeng Gao (1)Yao Teng (1)

Papers (8)

Jun 15, 2026

2d ago·also ByteDance, github, HKU

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

UniDDT achieves a groundbreaking balance between multimodal understanding and generation, outperforming existing models in both tasks with enhanced semantic coherence.

Shuai Wang, Liang Li, Yang Chen +3

Computer Vision Multimodal Models

Jun 14, 2026

3d ago·also NJU, Northwestern, WHU

Geometrically Constrained Decentralized Independent Vector Analysis for Distributed Microphone Arrays

Incorporating direction-of-arrival information, GC-Dec-IVA significantly enhances source separation in distributed microphone arrays, overcoming critical limitations of previous methods.

Changda Chen, Yichen Yang, Wei Liu +4

Distributed Systems & Hardware Speech & Audio

Jun 12, 2026

Tsinghua AI5d ago·also NJU, Waterloo

From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

LLMs are evolving from reactive chatbots to proactive digital colleagues, fundamentally changing how AI can assist in complex tasks.

Yongheng Zhang, Ziang Liu, Jiaxuan Zhu +17

Reasoning & Chain-of-Thought Scalable Oversight & Alignment Theory Tool Use & Agents

Jun 9, 2026

Linkai Liu +61w ago·also NJU

Planar-Sector LOS Guidance for Interception of Agile Targets with Lifting-Wing Quadcopters

Aggressive pursuit strategies can yield nearly 50% more thrust for quadcopters by relaxing traditional visibility constraints during interception.

Linkai Liu, Kun Yang, Han Zou +4

Computer Vision Robotics & Embodied AI

Jun 7, 2026

1w ago·also D Unreal OpenDRIVE, NJU, UMacau, UMich

IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking

Rapidly prototype and benchmark robotic navigation scenarios using simple YAML configurations, eliminating the coding barrier in simulation.

Ruihua Han, Shuai Wang, Chengyang Li +7

Robotics & Embodied AI World Models & Planning

Jun 2, 2026

2w ago·also BJTU, SJTU, Video Rebirth

Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation

Foley-Omni achieves expert-level performance in audio synthesis while generating cohesive soundtracks for video, enhancing both intelligibility and quality.

Ye Tao, Lu Liu, Xuenan Xu +6

Multimodal Models Speech & Audio

May 29, 2026

2w ago·also Tsinghua AI, ByteDance, CUHK, HKU +1

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Ditch the VAE bottleneck: Representation Forcing lets you train unified multimodal models to generate high-quality images directly from pixels, rivaling VAE-based approaches without the architectural constraint.

Yuqing Wang, Zhijie Lin, Ceyuan Yang +9

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Training Efficiency & Optimization

May 27, 2026

ETH3w ago·also Hunyuan Team, NJU, Northwestern, NTU +4

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

Over-reliance on agentic decomposition can actually *hurt* audio understanding when a strong audio frontend already provides sufficient information, highlighting the importance of conditional evidence acquisition.

Yucheng Wang, Jing Peng, Hanqi Li +6

Interpretability & Mechanistic Interp Speech & Audio Tool Use & Agents

Search

Shuai Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)