ByteDanceJun 9, 2026arXiv:2606.10988

AnimaSpark: A Feed-Forward Method for Animating Arbitrary 3D Objects

AI Summary

This paper introduces AnimaSpark, a novel feed-forward pipeline designed to generate category-agnostic 3D animations, addressing significant limitations in current methods related to inference speed, motion quality, and text prompt adherence. By leveraging a two-dimensional subspace to model joint transformations, AnimaSpark efficiently converts rigged static 3D models into animated sequences through a multi-layered image representation and a video generation model. Comprehensive evaluations demonstrate that AnimaSpark outperforms existing state-of-the-art techniques in text-motion alignment, motion quality, and computational efficiency.

Key Contribution

AnimaSpark achieves faster and higher-quality 3D animations by transforming joint movements into a 2D subspace, revolutionizing category-agnostic animation generation.

Abstract

While recent advancements in generative AI have substantially accelerated static 3D model creation workflows, the synthesis of category-agnostic 3D animations remains a significant bottleneck in 3D asset production. Current methods for category-agnostic animation generation exhibit critical limitations in inference speed, motion quality, and adherence to textual prompts, thereby leaving the process dependent on labor-intensive manual artistry. To address these challenges, this paper introduces AnimaSpark, a novel pipeline for category-agnostic 3D animation generation. Our approach is motivated by the key insight that for many fundamental motions in the 3D world, the corresponding joint transformations can often be effectively modeled within a two-dimensional subspace. The pipeline begins by rendering a rigged static 3D model into multi-layered image representations of its mesh and skeleton, which are subsequently fed into a video generation model. We then employ a keypoint tracking algorithm on the generated video to capture the motion of the skeletal joints projected onto the camera's viewing plane. In the final stage, we distill the planar translations and rotations from these tracked keypoints and lift them from the 2D domain into 3D space to animate the character. Comprehensive evaluations reveal that our method achieves superior performance over existing state-of-the-art techniques across key metrics, including text-motion alignment, quality of motion, and computational efficiency.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AnimaSpark: A Feed-Forward Method for Animating Arbitrary 3D Objects

Related Papers