Search papers, labs, and topics across Lattice.
This paper introduces a cascaded discrete diffusion framework for CAD generation, addressing the limitations of continuous diffusion models which struggle with the discrete and heterogeneous nature of CAD tokens. The framework consists of a command diffusion model and a parameter diffusion model, each employing transition matrices tailored to the specific characteristics of CAD commands and parameters. Experiments on the DeepCAD dataset demonstrate that this approach outperforms existing autoregressive and continuous diffusion models in unconditional generation and exhibits effective controllability in conditional generation tasks.
Discrete diffusion, with carefully designed transition matrices for commands and parameters, unlocks superior CAD generation compared to continuous diffusion baselines.
Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically invalid symbols. To overcome this limitation, we propose a cascaded discrete diffusion framework for CAD generation, which consists of a command diffusion for generating CAD commands and a parameter diffusion conditioned on CAD commands. Unlike isotropic Gaussian perturbation, the forward process of our approach operates directly over categorical token distributions using delicate transition matrices. For commands, we adopt an absorbing-state transition matrix that progressively corrupts tokens to a designated symbol; for parameters, we introduce specific transition matrices tailored to heterogeneous attributes: a Gaussian kernel for coordinate continuity, a scale-invariant kernel for dimensional values, and a prior-preserving kernel for boolean attributes. The reverse process is achieved by two denoising networks: a Transformer-based encoder for command recovery, and a parameter network with extra local self-attention for command-level interaction and cross-attention for conditional injection. Experiments on the DeepCAD dataset show that the proposed approach surpasses existing autoregressive and continuous diffusion models on unconditional generation metrics, while qualitative results validate effective controllability in conditional generation tasks. Source codes will be released.