Search papers, labs, and topics across Lattice.
This paper introduces a novel method for reducing control overhead in Tightly Coupled Processor Arrays (TCPAs) by deriving and aggressively minimizing control conditions from a polyhedral representation of the iteration space. They achieve a 15x to 45x reduction in control signals by representing conditions as unions of polyhedra and using bounded evaluation units in a lightweight global controller (GC). Evaluation on PolyBench kernels demonstrates that the proposed control flow requires less than 10% of the total array resources, enabling near-zero-overhead loop control.
Squeezing loop control down to <10% of array resources unlocks near-zero-overhead parallel loop acceleration on Tightly Coupled Processor Arrays.
Multidimensional loop kernels often suffer from control overhead that can dominate execution time on parallel loop accelerators. Tightly Coupled Processor Arrays (TCPAs) offload loop control to a global controller (GC), but existing approaches still require hundreds of control signals. We propose a method to derive and aggressively reduce these control conditions from a polyhedral representation of the iteration space, achieving reductions of 15x to 45x in control signals across several benchmarks. We introduce a lightweight GC architecture that evaluates conditions as unions of polyhedra using bounded evaluation units, requiring hardware comparable to a single processing element. Control signals are distributed throughout the array with a minimal number of delay elements resulting in zero-overhead loop control. Our evaluation on PolyBench kernels shows that the entire control flow requires<10 % of the total array resources.