HUSTVIVO AI LabJun 17, 2026arXiv:2606.19195

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Kangsheng Duan, Ziyang Xu, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang

AI Summary

This paper introduces Moebius, a lightweight image inpainting framework that achieves performance comparable to 10B-level models while utilizing only 0.2B parameters. By employing a novel Local-λ Mix Interaction (LλMI) block and an adaptive multi-granularity distillation strategy, Moebius effectively captures complex latent interactions and preserves high-fidelity outputs. Extensive evaluations reveal that Moebius not only matches but can even exceed the quality of existing large models, all while offering over 15 times faster inference times, marking a significant advancement in efficient image inpainting.

Key Contribution

Moebius achieves high-fidelity image inpainting with less than 2% of the parameters of leading models, setting a new benchmark for efficiency.

Abstract

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15times acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Related Papers