Apr 27, 2026arXiv:2604.24649

D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

Milo Liebster, Amin Mohaghegh, Andrew Boutros

AI Summary

This paper addresses the computational bottleneck of FPGA logic clustering, specifically the time-consuming legality checks for intracluster routing during the packing stage. The authors observed that many attempted cluster packings are repetitions of a smaller set of patterns, leading to redundant legality checks. They introduce "D\'ej\`a Vu packing," a technique that uses a packing signature tree to efficiently identify and memoize the results of legality checks for recurring packing patterns. This approach achieves speedups of up to 29.3x in the packing stage and reduces end-to-end VPR runtime by up to 5.3x on modern FPGA architectures.

Key Contribution

FPGA CAD tools waste enormous time re-checking the same cluster packings, but a simple memoization trick can slash runtime by up to 29x.

Abstract

Implementing a digital circuit on an FPGA fabric requires clustering technology-mapped netlist primitives into coarser-granularity blocks that can be directly mapped to the physical resources available on the FPGA. As the architecture of FPGA logic blocks (LBs) has grown in complexity, with sophisticated logic elements (LEs) and highly irregular local interconnect, this packing problem has become more challenging. To ensure the feasibility of intracluster routing, the computer-aided design (CAD) tools must solve a costly multi-source multi-sink routing problem for each candidate cluster. In this paper, we first show that such packing legality checks consume a significant portion of the CAD flow runtime for LB architectures with complex LEs and local routing structures resembling modern commercial FPGAs. We demonstrate that the packing stage constitutes 58% and 94% of the entire Versatile Place and Route (VPR) flow runtime on average when mapping a wide variety of benchmarks to the AMD 7-series-like and Altera Stratix-10-like VTR architecture captures, respectively. By analyzing the packing algorithm behavior, we observe that a significant fraction of the attempted packed clusters are repetitions of a much smaller number of packing patterns, and therefore many of the packing legality checks are redundant and could be skipped. To this end, we introduce our D\'ej\`a Vu packing approach, which leverages a novel packing signature tree data structure that enables efficient identification of recurring packing patterns and memoization of their legality check outcomes. Our approach speeds up the packing by up to 13.4x and 29.3x, with an average of 3.7x and 6.9x, across the evaluated benchmarks on the 7-series and Stratix 10 architectures. These packing runtime gains result in a significant 1.6x and 5.3x average reduction in end-to-end VPR runtime, while maintaining quality of results.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

Related Papers