Search papers, labs, and topics across Lattice.
This paper introduces Retrieval-Augmented Discrete Diffusion (RADD), a framework for multi-modal knowledge graph completion that decouples the retrieval and reranking stages. RADD uses a relation-aware multimodal KGE retriever for initial candidate selection and a conditional discrete denoiser for fine-grained entity disambiguation. Experiments on three MMKGC benchmarks demonstrate state-of-the-art performance, highlighting the benefits of decoupling global retrieval and local disambiguation.
Decoupling retrieval and reranking with a discrete diffusion model leaps ahead of monolithic embedding scorers for multi-modal knowledge graph completion.
Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-$K$ shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.