Minh Khoa Le

Applied Artificial Intelligence Initiative, Deakin University, Australia 2 FPT Smart Cloud, Vietnam 3 Deakin University, Australia 1,3 {minh.le, duc.nguyen, truyen.tran}@deakin.edu.au 2 kiendd6@fpt.com Abstract High-fidelity video generation remains challenging for diffusion models due to the difficulty of modeling complex spatio-temporal dynamics efficiently. Recent video diffusion methods typically represent a video as a sequence of spatio-temporal tokens which can be modeled using Diffusion Transformers (DiTs). However, this approach faces a trade-off between the strong but expensive Full

Papers on Lattice

Total citations

Topics

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Computer Vision (1)Training Efficiency & Optimization (1)

Frequent co-authors

Kien Do (1)Duc Thanh Nguyen (1)Truyen Tran (1)

Papers (1)

Mar 10, 2026

Mar 10, 2026·also Cohere

FrameDiT: Diffusion Transformer with Frame-Level Matrix Attention for Efficient Video Generation

FrameDiT achieves state-of-the-art video generation by ditching token-level attention for a novel matrix-based attention that operates directly on entire frames.

Minh Khoa Le, Kien Do, Duc Thanh Nguyen +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Search

Minh Khoa Le

Research focus

Frequent co-authors

Papers (1)