Kai Wang

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Natural Language Processing (5)Computer Vision (3)Multimodal Models (3)Architecture Design (Transformers, SSMs, MoE) (2)

Frequent co-authors

Hezhen Hu (1)Wangbo Zhao (1)Lanqing Guo (1)Hanwen Jiang (1)

Papers (7)

Jun 1, 2026

NUS2w ago·also Texas A&M, UT Austin

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

Achieving photorealistic 3D human avatars from a single image in under a second could revolutionize virtual reality and gaming applications.

Hezhen Hu, Wangbo Zhao, Lanqing Guo +6

Computer Vision Data Curation & Synthetic Data Multimodal Models

May 29, 2026

2w ago·also Brown

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Forget object-centric prompts: Function2Scene designs 3D indoor scenes directly from natural language descriptions of *how* the space will be used, not just *what* furniture to put there.

Ruiqi Wang, Qimin Chen, Daniel Ritchie +3

Computer Vision Natural Language Processing Robotics & Embodied AI

May 21, 2026

3w ago·also Cambridge, Fudan, HKUST, HUST +8

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

A 440MB multilingual translation model now rivals commercial APIs, opening the door for performant on-device translation.

Mao Zheng, Zheng Li, Tao Chen +44

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Apr 22, 2026

Apr 22, 2026·also SJTU, UTS

Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking

LLMs can achieve state-of-the-art unsupervised multimodal entity linking by reasoning over diverse evidence types, including graph-based neighborhood information.

Mo Zhou, Jianwei Wang, Kai Wang +2

Multimodal Models Natural Language Processing Reasoning & Chain-of-Thought

Apr 16, 2026

MiLM PlusApr 16, 2026·also Xiaomi AI Lab

ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

ControlFoley lets you generate audio from video with unprecedented control over text descriptions and reference audio, even when those inputs conflict.

Jianxuan Yang, Zhi Cheng, Kai Wang +10

Computer Vision Multimodal Models Speech & Audio

Apr 13, 2026

NUSApr 13, 2026·also HKUST

Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

Forget complex memory architectures: simple retrieval and generation, when carefully tuned for signal density, can outperform sophisticated methods in conversational agents.

Yuqian Wu, Zhengjun Huang, Junle Chen +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Yihao Zhang +9Apr 13, 2026

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

LLMs can be jailbroken with 90% success by subtly "salami slicing" harmful intent across multiple turns, even against state-of-the-art models like GPT-4o and Gemini.

Yihao Zhang, Kai Wang, Jiangrong Wu +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Search

Kai Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)