Zhizheng Wu

The Chinese University of Hong Kong, Shenzhen

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Speech & Audio (8)Multimodal Models (4)Architecture Design (Transformers, SSMs, MoE) (1)Constitutional AI & AI Ethics (1)

Frequent co-authors

Junwen Qiu (2)Junan Zhang (2)Dekun Chen (2)Jiaqi Li (1)

Papers (8)

Jun 30, 2026

1w ago·also ByteDance

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

FlexiSLM can operate at frame rates as low as 4.0 Hz while maintaining high-quality speech, effectively halving inference time compared to traditional models.

Jiaqi Li, Chaoren Wang, Xiaohai Tian +8

Multimodal Models Speech & Audio

Jun 24, 2026

Yicheng Gu +32w ago·also CUHK

Frequency-Aware Self-Supervised Music Representation Learning

PupuJEPA reveals that leveraging 2D spectrograms can dramatically enhance music representation learning, outperforming traditional 1D models across multiple tasks.

Yicheng Gu, Junan Zhang, Zhizheng Wu +1

Speech & Audio

Jun 18, 2026

3w ago·also Shenzhen Loop Area Institute, Shenzhen Transsion Holdings Co.

Zero-VC: Zero-Lookahead Streaming Voice Conversion via Speaker Anonymization

Speaker Anonymization enables real-time voice conversion without the latency penalties of future context buffering, revolutionizing streaming applications.

Yudong Li, Zihao Fang, Junwen Qiu +4

Speech & Audio

Jun 9, 2026

Jun 9, 2026·also Tsinghua AI, CUHK

ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language Models

Paralinguistic cues can be effectively harnessed in dialogue systems, leading to a 175% improvement in safety response accuracy without compromising overall model performance.

Yuxiang Wang, Qinke Ni, Shengbo Cai +3

Multimodal Models Speech & Audio

Jun 8, 2026

Ming-Hao Hsu +1Jun 8, 2026·also CUHK

Is Text All You Need? Text as a Universal Information Bottleneck for Speech LLMs

Geometry, not token discreteness, is the key to unlocking superior performance in speech-to-LLM integration.

Ming-Hao Hsu, Zhizheng Wu

Multimodal Models Speech & Audio

May 27, 2026

Chong Jing +3May 27, 2026·also CUHK

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

Achieve state-of-the-art few-shot RIR prediction by explicitly modeling the connection between geometric features and the RIR power spectrum.

Chong Jing, Zitong Lan, Junan Zhang +1

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Apr 16, 2026

Yuxiang Wang +10Apr 16, 2026·also CUHK

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

SLMs that seem safe with text inputs can completely fail when the same content is spoken, revealing a critical "speech grounding gap" in current models.

Yuxiang Wang, HongYu Liu, Yijiang Xu +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Speech & Audio

Apr 13, 2026

Tsinghua AIApr 13, 2026·also CUHK

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

Forget complex disentanglement architectures or low-quality synthetic targets: MimicLM achieves superior voice imitation by cleverly using synthetic speech as the *source* and real speech as the *target* in a pseudo-parallel training setup.

Tao Feng, Yuancheng Wang, Xueyao Zhang +4

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Search

Zhizheng Wu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)