UWApr 8, 2026arXiv:2604.06627

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

Jyotika Singh, Fang Tu, Weiyi Sun, Sujeeth Bharadwaj, Yassine Benajiba, Sujith Ravi, Dan Roth

AI Summary

DiffuMask is introduced as a diffusion-based prompt pruning framework that iteratively masks tokens in parallel, guided by hierarchical shot-level and token-level pruning signals. This approach accelerates prompt compression by masking multiple tokens per denoising step, achieving up to 80% prompt length reduction while preserving or improving accuracy across various settings. The method demonstrates generalizable and controllable prompt compression for faster and more reliable in-context reasoning.

Key Contribution

Get 80% of your prompt length back without sacrificing accuracy using a diffusion-based pruning method that can mask multiple tokens at once.

Abstract

In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, more expensive prompts that may contain redundant information. Prompt compression based on pruning offers a practical solution, yet existing methods rely on sequential token removal which is computationally intensive. We present DiffuMask, a diffusion-based framework integrating hierarchical shot-level and token-level pruning signals, that enables rapid and parallel prompt pruning via iterative mask prediction. DiffuMask substantially accelerates the compression process via masking multiple tokens in each denoising step. It offers tunable control over retained content, preserving essential reasoning context and achieving up to 80\% prompt length reduction. Meanwhile, it maintains or improves accuracy across in-domain, out-of-domain, and cross-model settings. Our results show that DiffuMask provides a generalizable and controllable framework for prompt compression, facilitating faster and more reliable in-context reasoning in LLMs.

Inference & Quantization Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

Related Papers