Search papers, labs, and topics across Lattice.
This paper introduces Compositional and Interpretable Intrinsic Concept Extraction (CI-ICE), a new task focused on extracting composable object-level and attribute-level concepts from a single image using diffusion-based text-to-image models. To address this task, they propose HyperExpress, a method that leverages hyperbolic space for concept disentanglement and a concept-wise optimization method to maintain inter-concept relationships and composability. Experiments demonstrate that HyperExpress effectively extracts compositionally interpretable intrinsic concepts.
Unlocking interpretable AI just got easier: HyperExpress disentangles image concepts into composable parts using hyperbolic space, letting you reconstruct visuals from their semantic building blocks.
Unsupervised Concept Extraction aims to extract concepts from a single image; however, existing methods suffer from the inability to extract composable intrinsic concepts. To address this, this paper introduces a new task called Compositional and Interpretable Intrinsic Concept Extraction (CI-ICE). The CI-ICE task aims to leverage diffusion-based text-to-image models to extract composable object-level and attribute-level concepts from a single image, such that the original concept can be reconstructed through the combination of these concepts. To achieve this goal, we propose a method called HyperExpress, which addresses the CI-ICE task through two core aspects. Specifically, first, we propose a concept learning approach that leverages the inherent hierarchical modeling capability of hyperbolic space to achieve accurate concept disentanglement while preserving the hierarchical structure and relational dependencies among concepts; second, we introduce a concept-wise optimization method that maps the concept embedding space to maintain complex inter-concept relationships while ensuring concept composability. Our method demonstrates outstanding performance in extracting compositionally interpretable intrinsic concepts from a single image.