B2∑iMay 14, 2026arXiv:2605.15042

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

Wuyang Li, Yang Gao, Mariam Hassan, Lang Feng, Wentao Pan, Po-Chien Luan, Alexandre Alahi

AI Summary

EverAnimate, a post-training method, tackles the challenge of long-horizon animated video generation by mitigating drift in both visual quality and character identity. It achieves this through Persistent Latent Propagation, which maintains context across chunks, and Restorative Flow Matching, which implicitly restores flow trajectories via velocity adjustment. Evaluated on animation generation, EverAnimate significantly outperforms state-of-the-art methods, achieving improvements of up to 15% in PSNR/SSIM and 32% in LPIPS/FID at 90-second horizons with lightweight LoRA tuning.

Key Contribution

Generate minute-long, high-fidelity animations without visual degradation or character drift using a surprisingly simple latent flow restoration technique.

Abstract

We propose EverAnimate, an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be synthesized against relatively static environments, making chunk-based generation prone to accumulated drift: (i) low-level quality drift, such as progressive degradation of static backgrounds, and (ii) high-level semantic drift, such as inconsistent character identity and view-dependent attributes. To address this issue, EverAnimate restores drifted flow trajectories by anchoring generation to a persistent latent context memory, consisting of two complementary mechanisms. (i) Persistent Latent Propagation maintains a context memory across chunks to propagate identity and motion in latent space while mitigating temporal forgetting. (ii) Restorative Flow Matching introduces an implicit restoration objective during sampling through velocity adjustment, improving within-chunk fidelity. With only lightweight LoRA tuning, EverAnimate outperforms state-of-the-art long-animation methods in both short- and long-horizon settings: at 10 seconds, it improves PSNR/SSIM by 8%/7% and reduces LPIPS/FID by 22%/11%; at 90 seconds, the gains increase to 15%/15% and 32%/27%, respectively.

Computer Vision Multimodal Models

Citation Metrics

Citations1

Influential citations0

References56

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

Related Papers