Jan 19, 2026

Systematic Review of Quantization-Optimized Lightweight Transformer Architectures for Real-Time Fruit Ripeness Detection on Edge Devices

AI Summary

This paper presents a systematic review of quantization and other model compression techniques applied to lightweight object detection architectures, particularly transformers, for fruit ripeness detection on edge devices. It analyzes studies from Scopus, IEEE Xplore, and ScienceDirect, focusing on the evolution from CNNs to transformer-based models like RT-DETR and Q-DETR. The review highlights the accuracy degradation introduced by aggressive low-bit quantization, especially in transformer attention, and demonstrates the superiority of Quantization-Aware Training (QAT) over Post-Training Quantization (PTQ) for performance preservation.

Key Contribution

Quantizing transformers for edge deployment can theoretically yield 16x speedups, but the accuracy degradation, especially in attention mechanisms, demands careful Quantization-Aware Training.

Abstract

Real-time visual inference on resource-constrained hardware remains a core challenge for edge computing and embedded artificial intelligence systems. Recent deep learning architectures, particularly Vision Transformers (ViTs) and Detection Transformers (DETRs), achieve high detection accuracy but impose substantial computational and memory demands that limit their deployment on low-power edge platforms such as NVIDIA Jetson and Raspberry Pi devices. This paper presents a systematic review of model compression and optimization strategies—specifically quantization, pruning, and knowledge distillation—applied to lightweight object detection architectures for edge deployment. Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, peer-reviewed studies were analyzed from Scopus, IEEE Xplore, and ScienceDirect to examine the evolution of efficient detectors from convolutional neural networks to transformer-based models. The synthesis highlights a growing focus on real-time transformer variants, including Real-Time DETR (RT-DETR) and low-bit quantized approaches such as Q-DETR, alongside optimized YOLO-based architectures. While quantization enables substantial theoretical acceleration (e.g., up to 16× operation reduction), aggressive low-bit precision introduces accuracy degradation, particularly in transformer attention mechanisms, highlighting a critical efficiency-accuracy tradeoff. The review further shows that Quantization-Aware Training (QAT) consistently outperforms Post-Training Quantization (PTQ) in preserving performance under low-precision constraints. Finally, this review identifies critical open research challenges, emphasizing the efficiency–accuracy tradeoff and the high computational demands imposed by Transformer architectures. Future directions are proposed, including hardware-aware optimization, robustness to imbalanced datasets, and multimodal sensing integration, to ensure reliable real-time inference in practical agricultural edge computing environments.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References20

Year2026

VenueComputers

Related Papers

Finding related papers...

Search

Systematic Review of Quantization-Optimized Lightweight Transformer Architectures for Real-Time Fruit Ripeness Detection on Edge Devices

Related Papers