Search papers, labs, and topics across Lattice.
This paper introduces a model-hardware co-design framework for CNN-based SAR ATR that jointly optimizes adversarial robustness, model compression, and FPGA accelerator design. The framework uses hardware-guided structured pruning, informed by a hardware performance model, to explore robustness-efficiency trade-offs. Experiments on MSTAR and FUSAR-Ship datasets show the framework produces models up to 18.3x smaller with 3.1x fewer MACs while preserving robustness, and the FPGA implementation achieves significant latency and energy efficiency improvements compared to CPU/GPU baselines.
Achieve 68x lower latency and 170x better energy efficiency for SAR ATR by jointly optimizing CNN model compression and FPGA accelerator design with a hardware-aware pruning strategy.
Convolutional Neural Networks (CNNs) have achieved state-of-the-art accuracy in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), but their high computational cost, latency, and memory usage make it challenging on resource-constrained platforms such as small satellites. While adversarial robustness is critical for real-world SAR ATR, it is often overlooked in system-level optimizations. Addressing both challenges requires more than model compression or accelerator design alone, demanding joint optimization in a unified framework. In this paper, we present a model-hardware co-design framework that unifies robustness-aware model compression and FPGA accelerator design for CNN-based SAR ATR. The framework consists of four key components: robust model training and compression, a hardware performance model, hardware design, and hardware implementation. Given a user-specified CNN architecture, a SAR dataset, and target FPGA metadata, our framework first applies adversarial training to obtain a robust baseline model. It then performs hardware-guided structured pruning, where pruning decisions are informed by both saliency scores and an analytical hardware performance model derived from our FPGA accelerator design. The hardware performance model estimates the cost of each channel in terms of MACs, latency, and FPGA resources, enabling the pruning algorithm to explore robustness-efficiency trade-offs under user-specified objectives and constraints. We design a fully pipelined streaming dataflow accelerator with channel-aware Processing Element (PE) allocation and develop an automated design generation flow to efficiently map the compressed models to optimized FPGA implementations. The pruning process produces a set of Pareto-optimal candidate models, from which the user can select based on their application requirements. The selected model is further quantized and implemented on FPGA using the automated design generation flow and parameterized high-level synthesis (HLS) templates. Overall, the framework provides an end-to-end co-design flow from robust and efficient model generation to optimized accelerator implementation. Experiments on the widely used MSTAR and FUSAR-Ship datasets across three CNN architectures show that our framework produces models up to 18.3脳 smaller with 3.1脳 fewer MACs while preserving robustness. Our FPGA implementation achieves up to 68.1脳/6.4脳 lower inference latency and up to 169.7脳/33.2脳 better energy efficiency compared to CPU/GPU baselines, demonstrating the effectiveness of our co-design framework in delivering robust and efficient SAR ATR.