UofTMay 6, 2026arXiv:2605.04527

Velox: Learning Representations of 4D Geometry and Appearance

Anagh Malik, Dorian Chan, Xiaoming Zhao, David B. Lindell, Oncel Tuzel, Jen-Hao Rick Chang

AI Summary

Velox learns latent representations of 4D objects from unstructured dynamic point clouds by encoding spatiotemporal color point clouds into dynamic shape tokens. These tokens are supervised by a 4D surface decoder for geometry and a Gaussian decoder for appearance. The resulting representation shows strong performance in video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation, demonstrating its utility for downstream tasks.

Key Contribution

Unlock efficient 4D object understanding from dynamic point clouds with Velox, a representation that's descriptive, compressive, and accessible.

Abstract

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks -- video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation -- and observe strong performances in all settings.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References124

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Velox: Learning Representations of 4D Geometry and Appearance

Related Papers