Search papers, labs, and topics across Lattice.
The paper introduces SKYLIGHT, a 3D photonic in-memory tensor core architecture designed for real-time AI inference, addressing scalability and reliability issues in existing photonic computing systems. SKYLIGHT employs a low-loss 3D Si/SiN crossbar topology, thermally robust WDM components, hierarchical signal accumulation using multi-port photodetectors, and optically programmed non-volatile PCM weights, enabling in-situ weight updates for layer-local learning. System-level modeling demonstrates that a single SKYLIGHT core achieves 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, outperforming an NVIDIA RTX PRO 6000 Blackwell GPU in FPS/W.
A novel 3D photonic architecture leapfrogs current GPU efficiency by 60% on ResNet-50 inference, opening the door to energy-efficient, real-time AI.
The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.