Zhifei Xie

Real-time audio interaction is now possible with a unified model that not only performs traditional tasks but also proactively responds to audio stimuli.

Zhifei Xie, Zihang Liu, Ze An +8

Multimodal Models Speech & Audio

Apr 12, 2026

NUSApr 12, 2026·also BIT, Edinburgh, NTU, SJTU

Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation

Text-centric agentic search is out: Deep-Reporter shows how to build multimodal agents that leverage both text and visuals for grounded long-form generation.

Fangda Ye, Zhifei Xie, Yuxin Hu +5

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Apr 9, 2026

Apr 9, 2026·also Anhui University, iFlytek, NTU

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

Real-world proactive agents can now infer latent user needs and act on them in real-time, rivaling state-of-the-art models in intent detection while maintaining low latency.

Zhifei Xie, Zongzheng Hu, Fangda Ye +12

Reasoning & Chain-of-Thought Tool Use & Agents World Models & Planning

Search

Zhifei Xie

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)