Search papers, labs, and topics across Lattice.
This paper introduces a quantized single-image super-resolution (SISR) framework optimized for INT8 deployment on mobile devices, using an extract-refine-upsample architecture. They employ a three-stage training pipeline involving spatial supervision, fidelity refinement via Charbonnier loss and teacher distillation from a Mamba-based model, and quantization-aware training on the fused deployment graph. Results on the MAI 2026 challenge demonstrate that teacher-guided supervision during quantization-aware training improves PSNR and SSIM scores for INT8 TFLite reconstruction.
Distilling knowledge from a Mamba-based teacher network significantly boosts the performance of quantized INT8 super-resolution models, enabling high-quality image enhancement on resource-constrained mobile devices.
Efficient single-image super-resolution (SISR) requires balancing reconstruction fidelity, model compactness, and robustness under low-bit deployment, which is especially challenging for x3 SR. We present a deployment-oriented quantized SISR framework based on an extract-refine-upsample design. The student performs most computation in the low-resolution space and uses a lightweight re-parameterizable backbone with PixelShuffle reconstruction, yielding a compact inference graph. To improve quality without significantly increasing complexity, we adopt a three-stage training pipeline: Stage 1 learns a basic reconstruction mapping with spatial supervision; Stage 2 refines fidelity using Charbonnier loss, DCT-domain supervision, and confidence-weighted output-level distillation from a Mamba-based teacher; and Stage 3 applies quantization-aware training directly on the fused deploy graph. We further use weight clipping and BatchNorm recalibration to improve quantization stability. On the MAI 2026 Quantized 4K Image Super-Resolution Challenge test set, our final AIO MAI submission achieves 29.79 dB PSNR and 0.8634 SSIM, obtaining a final score of 1.8 under the target mobile INT8 deployment setting. Ablation on Stage 3 optimization shows that teacher-guided supervision improves the dynamic INT8 TFLite reconstruction from 29.91 dB/0.853 to 30.0003 dB/0.856, while the fixed-shape deployable INT8 TFLite artifact attains 30.006 dB/0.857.