Mar 18, 2026arXiv:2603.17944

TransText: Transparency Aware Image-to-Video Typography Animation

Fei Zhang, Zijian Zhou, Bohao Tang, Sen He, Zhe Wang, Soubhik Sanyal, Pengfei Liu, Viktar Atliha, Tao Xiang, Frost Xu, Semih Gunel

AI Summary

TransText is introduced as a novel framework for adapting image-to-video models to layer-aware text animation by modeling the alpha channel as an RGB-compatible visual signal via latent spatial concatenation. This approach avoids retraining the VAE with scarce transparent glyph data, which can erode pre-trained semantic priors and cause latent pattern mixing. Experiments demonstrate that TransText generates coherent, high-fidelity transparent animations with diverse effects, outperforming existing methods.

Key Contribution

Achieve high-fidelity transparent text animations from image-to-video models without retraining the VAE, sidestepping data scarcity and latent pattern mixing issues.

Abstract

We introduce the first method, to the best of our knowledge, for adapting image-to-video models to layer-aware text (glyph) animation, a capability critical for practical dynamic visual design. Existing approaches predominantly handle the transparency-encoding (alpha channel) as an extra latent dimension appended to the RGB space, necessitating the reconstruction of the underlying RGB-centric variational autoencoder (VAE). However, given the scarcity of high-quality transparent glyph data, retraining the VAE is computationally expensive and may erode the robust semantic priors learned from massive RGB corpora, potentially leading to latent pattern mixing. To mitigate these limitations, we propose TransText, a framework based on a novel Alpha-as-RGB paradigm to jointly model appearance and transparency without modifying the pre-trained generative manifold. TransText embeds the alpha channel as an RGB-compatible visual signal through latent spatial concatenation, explicitly ensuring strict cross-modal (RGB-and-Alpha) consistency while preventing feature entanglement. Our experiments demonstrate that TransText significantly outperforms baselines, generating coherent, high-fidelity transparent animations with diverse, fine-grained effects.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TransText: Transparency Aware Image-to-Video Typography Animation

Related Papers