April 20 – April 27, 2026

Architecture Design (Transformers, SSMs, MoE) - Weekly Roundup

100 papers published across 6 labs.

3950% acceleration

Selected Labs publishing this week

NVIDIA2 Tsinghua AI2 Microsoft Research1 UW1 Amazon Science1

Top Papers

Apr 27, 2026

M. Spitznagel +1Apr 27, 2026

A New Kind of Network? Review and Reference Implementation of Neural Cellular Automata

Neural Cellular Automata, blending Wolfram's recursive programs with neural networks, offer a fresh perspective on modeling complex, self-organizing systems.

M. Spitznagel, Janis Keuper

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Zhongjie Duan +2Apr 27, 2026

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

Finally, a plugin framework that lets you mix-and-match KV-Cache, LoRA, and other controls to steer diffusion models without being locked into a specific backbone.

Zhongjie Duan, Hong Zhang, Yingda Chen

Architecture Design (Transformers, SSMs, MoE)Computer Vision Open-Source Models & Weights

NVIDIAApr 27, 2026·also Amazon Science, Microsoft Research, UW, Music X Lab +1

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.

Nvidia Amala Sanjay Deshmukh, K. Chumachenko, Tuomas Rintamaki +209

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Boyang Wang +4Apr 27, 2026

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

State-of-the-art shot boundary detection gets a major upgrade with a Transformer-based approach that not only improves accuracy but also offers more interpretable boundaries, thanks to a novel relational prediction framework and synthetic training data.

Boyang Wang, Guangyi Xu, Zhipeng Tang +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Zhiheng Liu +14Apr 27, 2026

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Ditching the vision encoder actually *improves* multimodal understanding at scale, proving that pixel embeddings alone can achieve state-of-the-art results in unified multimodal models.

Zhiheng Liu, Weiming Ren, Xiaoke Huang +12

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

All Papers (100)

Apr 27, 2026

M. Spitznagel +1Apr 27, 2026

A New Kind of Network? Review and Reference Implementation of Neural Cellular Automata

Neural Cellular Automata, blending Wolfram's recursive programs with neural networks, offer a fresh perspective on modeling complex, self-organizing systems.

M. Spitznagel, Janis Keuper

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Zhongjie Duan +2Apr 27, 2026

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

Finally, a plugin framework that lets you mix-and-match KV-Cache, LoRA, and other controls to steer diffusion models without being locked into a specific backbone.

Zhongjie Duan, Hong Zhang, Yingda Chen

Architecture Design (Transformers, SSMs, MoE)Computer Vision Open-Source Models & Weights

NVIDIAApr 27, 2026·also Amazon Science, Microsoft Research, UW, Music X Lab +1

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Nvidia Amala Sanjay Deshmukh, K. Chumachenko, Tuomas Rintamaki +209

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Boyang Wang +4Apr 27, 2026

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

Boyang Wang, Guangyi Xu, Zhipeng Tang +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Zhiheng Liu +14Apr 27, 2026

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Ditching the vision encoder actually *improves* multimodal understanding at scale, proving that pixel embeddings alone can achieve state-of-the-art results in unified multimodal models.

Zhiheng Liu, Weiming Ren, Xiaoke Huang +12

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Edward Hirst +1Apr 27, 2026

PINNs in More General Geometry

PINNs offer a promising new approach to solving complex problems in differential geometry by directly minimizing differential functionals.

Edward Hirst, Edward Hirst

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Minkyu Kim +7Apr 27, 2026

Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

The secret to effectively pruning LLMs might not be *how* you search for redundant layers, but *what* you're optimizing for.

Minkyu Kim, Vincent-Daniel Yun, Youngrae Kim +5

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Dmitry B. Rokhlin +3Apr 27, 2026

Dynamic Regret for Online Regression in RKHS via Discounted VAW and Subspace Approximation

Achieve dynamic regret bounds for online regression in RKHS by combining discounted VAW with finite-dimensional subspace approximations, offering a practical approach for time-varying comparisons.

Dmitry B. Rokhlin, D. B. Rokhlin, Georgiy A. Karapetyants +1

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Dongze Wu +2Apr 27, 2026

CoreFlow: Low-Rank Matrix Generative Models

Learning generative models for high-dimensional matrices doesn't have to be a computational nightmare: CoreFlow achieves state-of-the-art results in low-data regimes by learning shared low-rank structure.

Dongze Wu, Linglingzhi Zhu, Yao Xie

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

C. Squires +3Apr 27, 2026

A Unifying Framework for Unsupervised Concept Extraction

Concept extraction's identifiability problem just got a lot easier, thanks to a new framework that turns guarantee proofs into set intersection problems.

C. Squires, Chandler Squires, Pradeep Ravikumar +1

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Parsa Ashrafi Fashi +19Apr 27, 2026

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

Forget training from scratch: HyLo lets you breathe new (long-context) life into your existing Transformer LLMs, achieving 32x context extension and 90% KV-cache reduction.

Parsa Ashrafi Fashi, Parsa Ashrafi Fashi, Utkarsh Saxena +17

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Orhan Demirci +1Apr 27, 2026

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Multi-anchor word embeddings, previously impractical for LLMs, can now outperform standard embeddings with 98% fewer parameters and a 40x smaller embedding layer.

Orhan Demirci, Sezer Aptourachman

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Ruhr University BochumApr 27, 2026

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference

Not all layers are created equal: pruning the KV cache in a layer-dependent manner significantly boosts long-context LLM performance compared to uniform pruning strategies.

Zahra Dehghanighobadi, Asja Fischer

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

MilaApr 27, 2026·also Capital One

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

LLMs re-rank documents better when you learn to route each query to the specific attention heads that matter, instead of relying on static subsets or everything at once.

Yuxing Tian, Fengran Mo, Zhiqi Huang +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Razwan Ahmed Tanvir +3Apr 27, 2026

A Tree-Based Repository Blockchain Framework for Shared Governance in Collaborative Fork Ecosystems

Ditch the complexity of Inter-Blockchain Communication: this tree-based blockchain framework lets you navigate hard forks like directories in a file system.

Razwan Ahmed Tanvir, Razwan Ahmed Tanvir, Greg Speegle +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 27, 2026

Network Impact of Post-Quantum Certificate Chain sizes on Time to First Byte in TLS Deployments

Quantum-safe certificates bloat TLS handshakes so much that they measurably degrade web performance, and current CDN optimizations aren't enough to fully compensate.

Matthew Chou, Phuong M Cao

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Lavi Jain +1Apr 27, 2026

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Forget activation counts – RVC slashes Rowhammer mitigation overhead by up to 99.99% by directly tracking a row's vulnerability to bit flips.

Lavi Jain, Venkata Kalyan Tavva

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Verdict SecurityApr 27, 2026·also Ain Shams University

Machine-Checked Cardinality Bounds for Masked Barrett Reduction: A 1-Bit Side-Channel Leakage Barrier in Post-Quantum Cryptographic Hardware

Forget complex side-channel analysis: a single, machine-checked theorem proves that masked Barrett reduction leaks at most *one bit* of information per wire, offering a universal security guarantee for post-quantum crypto.

Ray Iskander, Khaled Kirah

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 27, 2026·also Prince of Songkla University

Resolving Conflicts Between RTOS Timekeeping and Uninterruptable Trusted Computing

Guaranteeing atomicity in secure enclaves doesn't have to break real-time OS timekeeping – a secure-driven synchronization mechanism can unobtrusively keep everything in sync.

Antonio Joia Neto, Amarin Laohajirapan, Norrathep Rattanavipanon +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

E. Woo +3Apr 27, 2026

System-aware contextual digital twin for ICS anomaly diagnosis

LLMs can now provide interpretable anomaly diagnoses in industrial control systems by translating detection evidence into actionable hypotheses for operators.

E. Woo, Y. Kim, Wonje Heo +1

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness

Fiza Naseer +4Apr 27, 2026

A systematic literature Review for Transformer-based Software Vulnerability detection

Transformer-based vulnerability detection is booming, but this review reveals critical gaps in data balance, interpretability, and cross-language generalization that could be holding back truly robust systems.

Fiza Naseer, Javed Ali Khan, Muhammad Yaqoob +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Apr 27, 2026·also CUHK

Mono2Sls: Automated Monolith-to-Serverless Migration via Multi-Stage Pipeline with Static Analysis

Automating monolith-to-serverless migration is now possible with an LLM-powered pipeline that outperforms commercial tools.

Xingyan Chen, Yuxin Su, Zishan Su +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

Yifan Zhang +2Apr 27, 2026

RefEvo: Agentic Design with Co-Evolutionary Verification for Agile Reference Model Generation

LLMs can now generate reliable hardware reference models with 95% accuracy thanks to a novel co-evolutionary verification mechanism that weeds out correlated hallucinations between model and testbench.

Yifan Zhang, Jianmin Ye, Jiahao Yang

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Maitreya Patel +4Apr 27, 2026

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations

Autoregressive image models can now compete with diffusion models in image quality and efficiency, thanks to a variable-length tokenization scheme that decouples compute from resolution.

Maitreya Patel, Jingtao Li, Weiming Zhuang +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Yuanhao Gong +2Apr 27, 2026

Shared-kernel Wavelet Neural Networks for Poisson Image Reconstruction

Achieve real-time, accurate image reconstruction from sparse Laplacian fields using a wavelet neural network with only 200 parameters.

Yuanhao Gong, Tang Tang, Qianyan Liu

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Apr 27, 2026

Computational Design and Co-Robotic Fabrication for Material Reuse in Architecture

Imagine buildings that adapt to the materials available, not the other way around: this framework uses robots to make it a reality.

Arash Adel, Daniel Ruan, Ruxin Xie

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

NVIDIAApr 27, 2026

MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives

Forget clunky animation pipelines – MotionBricks lets you assemble real-time, high-quality character motions like LEGOs, even controlling robots.

Tingwu Wang, Olivier Dionne, Mick Ruyter +13

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Xi Shen +3Apr 27, 2026

Opto-Atomic Spatio-Temporal Holographic Correlators for High-Speed 3D CNNs

Ditch silicon bottlenecks: a novel optoelectronic correlator uses cold atoms to accelerate 3D CNNs by orders of magnitude.

Xi Shen, Bowen Qi, Tabassom Hamidfar +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Milo Liebster +2Apr 27, 2026

D\'ej\`a Vu Packing: Optimizing FPGA Logic Clustering Runtime via Pattern Memoization

FPGA CAD tools waste enormous time re-checking the same cluster packings, but a simple memoization trick can slash runtime by up to 29x.

Milo Liebster, Amin Mohaghegh, Andrew Boutros

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Wang Fan +7Apr 27, 2026

Salca: A Sparsity-Aware Hardware Accelerator for Efficient Long-Context Attention Decoding

Forget A100s for long-context LLMs – Salca achieves up to 74x better energy efficiency with a sparsity-aware hardware accelerator.

Wang Fan, Wei Cao, Xionghui Zha +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Xingshuai Lu +5Apr 27, 2026·also Institute of Artificial Intelligence

Vib2Conf: AI-driven discrimination of molecular conformations from vibrational spectra

Vib2Conf achieves unprecedented accuracy in identifying 3D molecular conformations from vibrational spectra, even distinguishing between near-isomeric conformers differing by only ~1 Å RMSD.

Xingshuai Lu, Dechen Lin, T. Zhu +3

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Apr 27, 2026·also NYU, UCI

Enhancing molecular dynamics with equivariant machine-learned densities

Unlock spectroscopic and electronic observables in large-scale molecular simulations by learning the electron density directly, paving the way for more comprehensive and transferable machine-learned interatomic potentials.

Mihail Bogojeski, Muhammad R. Hasyim, Leslie Vogt-Maranto +3

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Chenglong Chu +34Apr 27, 2026·also Kuaishou

Kwai Summary Attention Technical Report

Sub-linear attention is now possible without sacrificing complete long-range dependency retention, thanks to learnable summary tokens that compress context.

Chenglong Chu, Guorui Zhou, Guowang Zhang +32

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Training Efficiency & Optimization

Wenxuan Yang +5Apr 27, 2026

Modeling Behavioral Intensity and Transitions for Generative Recommendation

Generative recommendation gets a boost: modeling behavior intensity and transitions yields 15-23% gains in prediction accuracy.

Wenxuan Yang, Xiaoyang Xu, Hanyu Zhang +3

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Apr 27, 2026·also UC Santa Cruz, UQ

Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising

LLMs can denoise sequential recommendations by disagreeing with the recommendation model itself, leading to more robust performance against noisy user data.

Sijian Li, Min Gao, Zongwei Wang +3

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Y. Baba +1Apr 27, 2026

Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows

Achieve millisecond-level 3D point cloud reconstruction from a single image without sacrificing quality, blowing past diffusion model latency.

Y. Baba, Keiji Yanai

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Tanmoy Mukherjee +3Apr 27, 2026

Credal Concept Bottleneck Models for Epistemic-Aleatoric Uncertainty Decomposition

Concept bottleneck models can now distinguish between reducible model uncertainty and irreducible input ambiguity, enabling targeted interventions like data collection and human review.

Tanmoy Mukherjee, Thomas Bailleux, Pierre Marquis +1

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Apr 24, 2026

Zhe Yu +7Apr 24, 2026

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Forget rigid multi-agent pipelines: this framework lets you build self-organizing AI "companies" that dynamically recruit talent and adapt to tasks on the fly.

Zhe Yu, YuQi Fu, Zhiyuan He +5

Architecture Design (Transformers, SSMs, MoE)Scalable Oversight & Alignment Theory Tool Use & Agents

Apr 23, 2026

Jialong Mai +2Apr 23, 2026

MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control

Finally, a TTS system that lets you control the *exact* timing and pauses of individual words, opening the door to applications like perfectly paced guided reading and accessible code narration.

Jialong Mai, Xiaofen Xing, Xiangmin Xu

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Independent ResearcherApr 23, 2026

Soft Anisotropic Diagrams for Differentiable Image Representation

SAD offers a surprisingly fast and accurate alternative to neural implicit representations for image compression and differentiable rendering, achieving 4-19x training speedups while outperforming state-of-the-art methods like Image-GS.

Laki Iinbor, Zhi-Chao Dou, Wojciech Matusik

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Ceyuan Yang +19Apr 23, 2026

Context Unrolling in Omni Models

Training a single model across text, images, video, 3D geometry, and hidden representations unlocks "Context Unrolling," where the model reasons across modalities to improve reasoning fidelity.

Ceyuan Yang, Zhijie Lin, Yang Zhao +17

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Reasoning & Chain-of-Thought

University of Milano-BicoccaApr 23, 2026·also University of Genova

Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

Explicitly conditioning neural surrogates on supersaturation dramatically improves their accuracy in simulating crystal growth dynamics compared to implicit inference, especially with limited data.

Matteo Rigoni, D. Lanzoni, F. Montalenti +1

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Apr 23, 2026

Low-Rank Adaptation Redux for Large Models

Signal processing offers a surprisingly effective lens for understanding and improving LoRA, the reigning champ of parameter-efficient fine-tuning.

Bingcong Li, Yilang Zhang, Georgios B. Giannakis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Yixian Xu +6Apr 23, 2026

Quotient-Space Diffusion Models

Quotient-space diffusion elegantly sidesteps the need to learn symmetry transformations, leading to more efficient and accurate generative models for systems with inherent symmetries.

Yixian Xu, Yusong Wang, Shengjie Luo +4

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Jian Cheng Wong +5Apr 23, 2026

Transferable Physics-Informed Representations via Closed-Form Head Adaptation

Solve new PDEs 100x faster with 10x less error by learning a transferable PINN representation and adapting to new equations with a single closed-form calculation.

Jian Cheng Wong, Isaac Yin Chung Lai, P. Chiu +3

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

Buqiang Xu +7Apr 23, 2026

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

LLMs can now reason across long conversations without breaking the bank: StructMem slashes token usage and API calls while boosting temporal reasoning.

Buqiang Xu, Yijun Chen, Jizhan Fang +5

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought Tool Use & Agents

Apr 23, 2026

There Will Be a Scientific Theory of Deep Learning

Forget philosophical debates: a practical "learning mechanics" is crystallizing to explain *how* deep learning works, not just *why* it should.

James B. Simon, D. Kunin, Alexander Atanasov +11

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Isabel Kurth +2Apr 23, 2026

Evaluating Post-hoc Explanations of the Transformer-based Genome Language Model DNABERT-2

Despite their architectural differences, Transformer-based genome language models can provide equally reliable biological insights as CNNs, as revealed by attention-based explainability methods.

Isabel Kurth, Paulo Yanez Sarmiento, Bernhard Y. Renard

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Benedikt Bollig +3Apr 23, 2026

Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

N-gram models can rival neural networks in event log prediction, but the secret sauce is a smart ensemble method that dynamically promotes the best model during inference.

Benedikt Bollig, Matthias Fugger, Thomas Nowak +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

E. E. KrauseApr 23, 2026

Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

Forget ReLU's rough edges: a new family of smooth activation functions, GEM, closes the gap with GELU and even outperforms it in some cases, revealing a surprising architecture-dependent sweet spot for smoothness.

E. E. Krause

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Eli Gildish +2Apr 23, 2026

Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach

Achieve state-of-the-art periodic signal denoising with a single, lightweight dilated CNN that generalizes across frequencies via resampling.

Eli Gildish, Michael Grebshtein, I. Makienko

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Apr 23, 2026·also Samsung

A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

IoT intrusion detection gets a boost: A-THENA's time-aware encoding and network-specific augmentation beats state-of-the-art methods by up to 6.88% in accuracy, all while running on a Raspberry Pi Zero 2 W.

Ioannis Panopoulos, Maria-Lamprini A. Bartsioka, Sokratis Nikolaidis +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Red-Teaming & Adversarial Robustness

Rishona Daniels +4Apr 23, 2026

On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification

Volatile memristors can achieve state-of-the-art image classification accuracy in reservoir computing, even with significant device variability, suggesting they are a viable alternative to traditional CMOS.

Rishona Daniels, Duna Wattad, Ronny Ronen +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Arindam Sengupta +4Apr 23, 2026

A temporal deep learning framework for calibration of low-cost air quality sensors

LSTMs can bring low-cost air quality sensors up to regulatory compliance, unlocking dense urban monitoring networks previously limited by calibration challenges.

Arindam Sengupta, T. Bush, B. Marner +2

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Ji-Ying Song +4Apr 23, 2026

Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms

ResGIN-Att's cross-attention mechanism not only boosts drug synergy prediction but also offers a peek into the "why" behind drug interactions by highlighting crucial chemical substructures.

Ji-Ying Song, Wenyang Wang, Cheng Yan +2

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

C. Schneider +2Apr 23, 2026

Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Achieve LLM personalization with the guarantee that deleting a small user-specific proxy deterministically erases all traces of their data, sidestepping the need for computationally expensive retraining.

C. Schneider, Philipp Schoenegger, Ben Bariach

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Yixuan Zhu +8Apr 23, 2026

VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution

VARestorer distills a text-to-image VAR model into a one-step super-resolution network, achieving state-of-the-art image quality with a 10x speedup.

Yixuan Zhu, Shilin Ma, Haolin Wang +6

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Apr 23, 2026·also Basque Center for Applied Mathematics (BCAM), Ikerbasque, University of the Basque Country (UPV/EHU)

A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization

PINNs can now efficiently solve highly oscillatory wave equations in heterogeneous media, thanks to a Green's function-based integral formulation that cuts computation by 10x and avoids absorbing boundary layers.

Mohammad Mahdi Abedi, David Pardo, T. Alkhalifah

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

Wei Jiang +1Apr 23, 2026

Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression

Forget compressing entire tokens – selectively routing *parts* of tokens based on query relevance unlocks better compression-quality tradeoffs in LoRA-adapted transformers.

Wei Jiang, Wei Wang

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Jon-Paul CacioliApr 23, 2026

Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding

Cross-entropy loss isn't just a detail – it's the unsung hero behind how well energy probes work in predictive coding networks, accounting for up to 66% of the probe-softmax gap.

Jon-Paul Cacioli

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Manuscript received April 19Apr 23, 2026

Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Channel-free HAR is now possible: a single model can perform activity recognition across diverse IoT sensor setups without needing fixed channel arrangements, thanks to metadata-conditioned fusion.

Tatsuhito Hasegawa

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Robotics & Embodied AI

Zhao WangApr 23, 2026

Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

A game-theory-inspired ensemble of LLMs and a lightweight verifier slashes the cost of code vulnerability detection while boosting accuracy, proving that strategic agent design can beat brute-force scaling.

Zhao Wang

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Abbas Zeitoun +2Apr 23, 2026

Hyperloop Transformers

Halving the parameter count of LLMs without sacrificing performance is now possible with Hyperloop Transformers, thanks to looped layers and hyper-connected residual streams.

Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Neeraj Gangwar +6Apr 23, 2026

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

Vector-based fine-tuning just got an 8x speed boost, rivaling LoRA's performance with a fraction of the parameters, thanks to a clever gradient-informed initialization.

Neeraj Gangwar, Rishabh Deshmukh, Michael Shavlovsky +4

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Boxun Xu +9Apr 23, 2026

Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

Autoregressive video diffusion models can achieve faster decoding, lower memory footprint, and higher quality long-horizon generations by learning to attend to only the most salient spatiotemporal blocks.

Boxun Xu, Yuming Du, Zichang Liu +7

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Eleanor P. Wiesler +1Apr 23, 2026

Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability

Forget repeatedly re-running inference on residual graphs: this GNN-guided Ford-Fulkerson algorithm learns edge importance probabilities to dramatically accelerate max-flow computation and image segmentation.

Eleanor P. Wiesler, Trace Baxley

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Costin-Andrei Oncescu +5Apr 23, 2026

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

Recurrent Transformers let you trade model depth for width, slashing KV cache memory footprint and inference latency without sacrificing performance.

Costin-Andrei Oncescu, Depen Morwani, Samy Jelassi +3

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Muhy Eddin Za’ter +3Apr 23, 2026

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

A transformer-based deep learning approach can not only drastically accelerate Unit Commitment problem-solving but also, surprisingly, find lower-cost operational schedules than traditional MILP solvers in certain instances.

Muhy Eddin Za’ter, Anna Van Boven, B. Hodge +1

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Anuj Sadani +1Apr 23, 2026

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

LLM agents are wasting up to 60k tokens per turn on unnecessary tool schemas – Tool Attention slashes this "Tools Tax" by 95% and unlocks truly scalable agentic workflows.

Anuj Sadani, Deepak Kumar

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Tool Use & Agents

K. FojcikApr 23, 2026

Efficient Logic Gate Networks for Video Copy Detection

Achieve competitive video copy detection accuracy with descriptors orders of magnitude smaller and inference speeds exceeding 11k samples per second by replacing floating-point operations with a learned Boolean circuit.

K. Fojcik

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Nevena Lazi'c +3Apr 23, 2026

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

Unseen token generalization in transformers isn't just about copying; it's fundamentally limited by a representational collapse in the unembedding space.

Nevena Lazi'c, Liam H. Fowl, Andr'as Gyorgy +1

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Emilie Frost +1Apr 23, 2026

Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints

Securing energy grids against cyberattacks may hinge on clever observer/controller architectures that respect data privacy and regulatory constraints.

Emilie Frost, Astrid Nieße

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Tsinghua AIApr 23, 2026

MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting

Autonomous vehicles can now plan trajectories 10x faster without sacrificing performance, thanks to a novel architecture that learns complex driving behaviors in latent space during training.

Yining Xing, Zehong Ke, Yiqian Tu +3

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Shivam Rawat +3Apr 23, 2026

Reasoning Primitives in Hybrid and Non-Hybrid LLMs

Hybrid architectures that combine attention and recurrence can maintain reasoning performance as task complexity increases, while transformers see a sharp performance drop-off.

Shivam Rawat, Lucie Flek, Florian Mai +1

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Robin Dey +1Apr 23, 2026

Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture

MemPalace's impressive memory recall isn't due to its fancy "memory palace" spatial organization, but rather its simple "store everything verbatim" approach combined with a strong embedding model.

Robin Dey, Panyanon Viradecha

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Jean-Philippe Bernardy +4Apr 23, 2026

Linear Constraints

Automate resource management in Linear Haskell with linear constraints, eliminating the need for explicit linear arguments and streamlining development.

Jean-Philippe Bernardy, R. Eisenberg, Csongor Kiss +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Minghao Yin +4Apr 23, 2026

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Forget generating static shapes – Sculpt4D now lets you efficiently sculpt dynamic 4D objects with state-of-the-art temporal coherence.

Minghao Yin, Wenbo Hu, Jiale Xu +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Jens Kanstrup Larsen +5Apr 23, 2026

NEST: Network Enforced Session Types (Technical Report)

Guarantee application-level protocol compliance without touching application code by pushing runtime verification into the network itself.

Jens Kanstrup Larsen, A. Scalas, Guy Amir +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Jingfang Li +6Apr 23, 2026

UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery

Achieve a 10x speedup in detecting tiny objects in massive satellite images without sacrificing accuracy, even on a single GPU.

Jingfang Li, Haoran Zhu, Wen Yang +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Yuhan Luo +2Apr 23, 2026

Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

Face forgery detectors crumble when evaluated on unseen data, but a new metric, Cross-AUC, finally exposes this hidden vulnerability.

Yuhan Luo, Tao Chen, Decheng Liu

Architecture Design (Transformers, SSMs, MoE)Computer Vision Eval Frameworks & Benchmarks

Wenmin Huang +3Apr 23, 2026

AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Achieve more precise facial attribute editing by decoupling attribute manipulation from image synthesis, sidestepping the optimization challenges of directly combining GANs and diffusion models.

Wenmin Huang, Weiqi Luo, Xiaochun Cao +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

M. Kada +4Apr 23, 2026

Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

Steal accuracy from dense models and stabilize MoE training with a simple teacher-guided routing scheme that combats gradient starvation.

M. Kada, Ryota Yoshihashi, Satoshi Ikehata +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

C. Mbonu +3Apr 23, 2026

an interpretable vision transformer framework for automated brain tumor classification

Achieve near-perfect brain tumor classification with a Vision Transformer, unlocking clinically interpretable insights via attention rollouts.

C. Mbonu, T. Belonwu, Okwuchukwu Ejike Chukwuogo +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Interpretability & Mechanistic Interp

Anvitha Ramachandran +2Apr 23, 2026

GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Vision GNNs can achieve near 100x speedups on FPGAs by decoupling graph construction from feature updates, enabling concurrent execution without significant accuracy loss after fine-tuning.

Anvitha Ramachandran, Dhruv Parikh, Viktor K. Prasanna

Architecture Design (Transformers, SSMs, MoE)Computer Vision Distributed Systems & Hardware

Xiangyu Ren +6Apr 23, 2026

Suppressing the Erasure Error of Fusion Operation in Photonic Quantum Computing

By explicitly addressing often-overlooked fusion erasure errors, this new compilation scheme unlocks exponentially more robust photonic quantum computations.

Xiangyu Ren, Yuexun Huang, Zhemin Zhang +4

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Apr 23, 2026·also Ritsumeikan University

WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

Unlock real-time, high-quality 3D scene reconstruction from unconstrained images with varying lighting, thanks to a feed-forward Gaussian Splatting model that learns appearance embeddings.

Yuki Fujimura, Takahiro Kushida, Kazuya Kitano +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Dachong Li +3Apr 23, 2026·also shenzhen university

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Explicitly constraining action generation with predicted spatial "corridors" boosts VLA model performance by up to 12.4% on challenging robotic manipulation tasks.

Dachong Li, Zhuangzhuang Chen, Jin Zhang +1

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Apr 23, 2026·also NTU, Ripple Labs, UCL

Systematizing Blockchain Research Themes and Design Patterns: Insights from the University Blockchain Research Initiative (UBRI)

Bridging the gap between blockchain research and real-world deployment requires navigating recurring design tensions like scalability vs. security, decentralization vs. governance, and privacy vs. compliance.

Chien-Chih Chen, Yitian Wang, Emma Nasseri +2

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Open-Source Models & Weights

Hongyao Liu +3Apr 23, 2026

SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

On-device LLM inference gets a massive speed and energy boost by adaptively streaming only the most expensive parts of the KV cache from the cloud.

Hongyao Liu, L. Zhai, Junyi Wang +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Guoyu Li +11Apr 23, 2026

SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

Forget hand-tuning: SPAC automatically generates FPGA-based network switches that slash latency by up to 38% while dramatically reducing resource usage.

Guoyu Li, Yang Cao, Lucas H L Ng +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Apr 23, 2026

WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation

Achieve state-of-the-art sequential recommendations by aligning multi-resolution temporal dynamics with graph propagation at matching scales.

Peilin Liu, Zhiquan Ji, Gang Yan

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

V'ictor Duarte MeloApr 23, 2026

ECCFROG522PP: An Enhanced 522 bit Weierstrass Elliptic Curve

Tired of opaque elliptic curve parameters? ECCFROG522PP offers a fully transparent and reproducible 522-bit alternative, letting you independently verify its security.

V'ictor Duarte Melo

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Leipzig UniversityApr 23, 2026·also Mercedes-Benz Tech Innovation GmbH

Process-Mining of Hypertraces: Enabling Scalable Formal Security Verification of (Automotive) Network Architectures

Uncover hidden attack patterns in automotive networks by combining formal verification with process mining, revealing root causes of security vulnerabilities that traditional methods miss.

Julius Figge, David Knuplesch, A. Maletti +1

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness

Apr 23, 2026

Prefix Parsing is Just Parsing

Forget specialized prefix-parsing algorithms: a simple grammar transformation lets you use standard parsers for efficient prefix parsing and next-token prediction.

Clemente Pasti, Andreas Opedal, Timothy J. O'Donnell +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Mirage Mountain Technologies IncApr 23, 2026

Listen and Chant Before You Read: The Ladder of Beauty in LM Pre-Training

Forget text-only pre-training: training on music *first* can dramatically accelerate language learning in small language models.

Yoshinori Nomura

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Yilong Chen +12Apr 23, 2026·also CAS

Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

By dynamically injecting frequency-aware n-gram features, X-GRAM achieves state-of-the-art accuracy with smaller embedding tables, offering a practical path to scaling memory-augmented architectures.

Yilong Chen, Yan Xie, Zitian Gao +10

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Yiming Zhong +7Apr 23, 2026

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

By spectrally decoupling robot control into intent and dynamics, ResVLA offers a more efficient and robust approach to generative VLA policies.

Yiming Zhong, Yaoyu He, Zemin Yang +5

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Minping Chen +6Apr 23, 2026

Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation

LLMs can rewrite bad job descriptions and category-aware MoEs can better match candidates, leading to a 19.4% boost in recruitment click-through rates and millions saved.

Minping Chen, Bingquan Xu, Zulong Chen +4

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Recommendation & Information Retrieval

Laura Valeria Perez-Herrera +2Apr 23, 2026

Attention-based multiple instance learning for predominant growth pattern prediction in lung adenocarcinoma wsi using foundation models

Skip the pixel-perfect annotations: attention-based MIL with pathology foundation models can predict lung cancer growth patterns from whole slide images with surprisingly high accuracy.

Laura Valeria Perez-Herrera, M. J. García-González, Karen López-Linares

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Tsinghua AIApr 23, 2026

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

By unifying generative and discriminative approaches, UniGenDet achieves superior image generation and detection, suggesting that these tasks benefit from a symbiotic relationship previously hindered by architectural divergence.

Yanran Zhang, Wenzhao Zheng, Yifei Li +5

Architecture Design (Transformers, SSMs, MoE)Computer Vision Data Curation & Synthetic Data