Search papers, labs, and topics across Lattice.
This paper investigates the geometric properties of natural images, finding that semantic information is primarily encoded in directional components. Based on this observation, they propose Spherical Optimal Transport Flow Matching (SOT-CFM) and Spherical Flow Matching (SFM), which leverage angular distance and manifold constraints, respectively. Experiments demonstrate that these geometry-aware flow matching methods outperform Euclidean baselines in image generation.
Natural images aren't Euclidean鈥攎odeling them on a hypersphere unlocks better generative performance.
Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.