Search papers, labs, and topics across Lattice.
The paper introduces region-specific image refinement, a task focused on restoring fine-grained details within a user-defined region while preserving the rest of the image. They propose RefineAnything, a multimodal diffusion model employing a "Focus-and-Refine" strategy that crops and resizes the region of interest to improve local reconstruction, combined with a blended-mask paste-back for background preservation. Experiments on the newly introduced RefineEval benchmark demonstrate RefineAnything's superior performance in detail restoration and background consistency compared to existing methods.
Counterintuitively, cropping and resizing a region of interest before refinement dramatically improves the fidelity of local detail restoration in diffusion models, enabling near-perfect background preservation.
We introduce region-specific image refinement as a dedicated problem setting: given an input image and a user-specified region (e.g., a scribble mask or a bounding box), the goal is to restore fine-grained details while keeping all non-edited pixels strictly unchanged. Despite rapid progress in image generation, modern models still frequently suffer from local detail collapse (e.g., distorted text, logos, and thin structures). Existing instruction-driven editing models emphasize coarse-grained semantic edits and often either overlook subtle local defects or inadvertently change the background, especially when the region of interest occupies only a small portion of a fixed-resolution input. We present RefineAnything, a multimodal diffusion-based refinement model that supports both reference-based and reference-free refinement. Building on a counter-intuitive observation that crop-and-resize can substantially improve local reconstruction under a fixed VAE input resolution, we propose Focus-and-Refine, a region-focused refinement-and-paste-back strategy that improves refinement effectiveness and efficiency by reallocating the resolution budget to the target region, while a blended-mask paste-back guarantees strict background preservation. We further introduce a boundary-aware Boundary Consistency Loss to reduce seam artifacts and improve paste-back naturalness. To support this new setting, we construct Refine-30K (20K reference-based and 10K reference-free samples) and introduce RefineEval, a benchmark that evaluates both edited-region fidelity and background consistency. On RefineEval, RefineAnything achieves strong improvements over competitive baselines and near-perfect background preservation, establishing a practical solution for high-precision local refinement. Project Page: https://limuloo.github.io/RefineAnything/.