Search papers, labs, and topics across Lattice.
This paper investigates structural hallucinations in diffusion models by framing them as instabilities on the model-induced manifold, using local intrinsic dimension (LID) to characterize these instabilities. They demonstrate that a LID-based hallucination filter performs comparably to temporal filters and identify LID as the primary driver of hallucinations. To mitigate these issues, they introduce Intrinsic Quenching (IQ), a method that reduces LID, achieving state-of-the-art performance in hallucination reduction across various benchmarks, including improved anatomical consistency in medical imaging.
Hallucinations in diffusion models aren't just mode interpolation gone wrong, but instabilities on the model's manifold, and squashing its local intrinsic dimension can fix them.
Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced manifold. We begin by showing that a hallucination filter based on such instabilities matches or exceeds the performance of the recently proposed temporal one. By tracing the source of these instabilities, we identify local intrinsic dimension (LID) as their primary driver and propose Intrinsic Quenching (IQ), a direct corrective mechanism that deflates it to alleviate hallucinations. IQ consistently outperforms standard hallucination reduction baselines across a wide array of benchmarks and offers a highly promising solution for enforcing anatomical consistency in downstream medical imaging tasks.