Latticethe structure behind the noise

Papers Digest Topics Selected Labs Collections FAQ

Created by Flynn Lachendro

Papers Digest Topics Labs Saved

Search

Search papers, labs, and topics across Lattice.

Built by Flynn Lachendro·𝕏 / Twitter·RSS··FAQ·Glossary·Privacy

Zhaoye Fei | Lattice

Zhaoye Fei

Papers on Lattice

4

Total citations

0

Topics

6

h-index

11

Research focus

Speech & Audio (4)Architecture Design (Transformers, SSMs, MoE) (2)Multimodal Models (2)Natural Language Processing (1)Open-Source Models & Weights (1)

Frequent co-authors

Xiaogui Yang (3)Qinyuan Cheng (3)Mingshu Chen (3)Ruixiao Li (3)

Papers (4)

Mar 30, 2026

Mar 30, 2026·also Shanghai Innovation

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions

Cinematic speech data unlocks more realistic and controllable voice generation from natural language descriptions.

Kexin Huang, Liwei Fan, Botian Jiang +9

Natural Language Processing Speech & Audio

Mar 18, 2026

Mar 18, 2026·also Fudan

MOSS-TTS Technical Report

Achieve controllable and scalable speech generation with MOSS-TTS, enabling zero-shot voice cloning and long-form synthesis.

Yitian Gong, Y. Gong, Botian Jiang +28

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Speech & Audio

Feb 11, 2026

Yitian Gong +11Feb 11, 2026

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

A purely Transformer-based audio tokenizer, pre-trained on 3M hours of data, leapfrogs existing codecs and even enables a fully autoregressive TTS model to outperform cascaded systems.

Yitian Gong, Y. Gong, Kuangwei Chen +9

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Feb 9, 2026

Tsinghua AIFeb 9, 2026·also Fudan, TU Darmstadt, UQ

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Open-source MOVA lets you generate synchronized, high-quality video and audio—including realistic lip sync—without relying on closed-source systems.

SII-OpenMOSS Team Donghua Yu, Mingshu Chen, Qi Chen +33

Computer Vision Multimodal Models Speech & Audio

Computer Vision (1)

Shimin Li (3)

Yaozhou Jiang (2)

Yitian Gong (2)