May 1 – May 8, 2026

Architecture Design (Transformers, SSMs, MoE) - Weekly Roundup

100 papers published across 6 labs.

Selected Labs publishing this week

Top Papers

May 6, 2026

Moshe Eliasof +42w ago

Bridging Input Feature Spaces Towards Graph Foundation Models

Graph models can now generalize to entirely new datasets with different input features, thanks to a simple projection into a shared random space.

Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua +2

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Shitong Shao +62w ago

Lightning Unified Video Editing via In-Context Sparse Attention

Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.

Shitong Shao, Zikai Zhou, Haopeng Li +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Alper Yıldırım2w ago

Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Transformers may succeed at time series forecasting without relying on the complex superposition that drives their power in NLP, challenging the assumption that these models are leveraging rich compositional representations.

Alper Yıldırım

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

2w ago·also Apple ML

Taming Outlier Tokens in Diffusion Transformers

Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.

Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Department of Mathematics2w ago·also Georgia Tech, Purdue, School of Mathematics

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Transformers can be explicitly designed to perform nonlinear regression in-context by leveraging attention as a featurizer, offering a theoretical understanding of how these models learn complex relationships from prompts.

Alexander Hsu, Zhaiming Shen, Wenjing Liao +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scaling Laws & Emergent Abilities

All Papers (100)

May 6, 2026

Shitong Shao +62w ago

Lightning Unified Video Editing via In-Context Sparse Attention

Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.

Shitong Shao, Zikai Zhou, Haopeng Li +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Alper Yıldırım2w ago

Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Yıldırım

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

2w ago·also Apple ML

Taming Outlier Tokens in Diffusion Transformers

Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.

Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Department of Mathematics2w ago·also Georgia Tech, Purdue, School of Mathematics

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu, Zhaiming Shen, Wenjing Liao +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scaling Laws & Emergent Abilities

2w ago·also BAIR, Princeton

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

Forget scaling laws – the real bottleneck in associative memory isn't storage, it's retrieval: forcing a single "winner" costs you a logarithmic factor in capacity compared to allowing a ranked list.

Nicholas Barnfield, Juno Kim, Eshaan Nichani +2

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

2w ago

Estimating the expected output of wide random MLPs more efficiently than sampling

Skip the sampling: accurately predict the behavior of wide, random MLPs with a fraction of the compute, especially when assessing rare, high-stakes outcomes.

Wilson Wu, Victor Lecomte, Michael Winer +3

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Arthur Gretton +52w ago

On the Wasserstein Gradient Flow Interpretation of Drifting Models

GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.

Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Xiaoyu Jiang +42w ago

Transformed Latent Variable Multi-Output Gaussian Processes

Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.

Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Andreas Pattichis +12w ago

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

Forget rigid memory structures: Memini lets your LLM's external knowledge evolve organically, learning and forgetting like a brain.

Andreas Pattichis, Constantine Dovrolis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Ludwig-Maximilians-Universität München2w ago

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

Infinite-width approximations, a cornerstone of neural network theory, crumble much faster in recurrent models than previously thought, failing beyond a depth of order $\sqrt{n}$.

Mariia Seleznova

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities

Department of Mathematics2w ago

Proximal Projection for Doubly Sparse Regularized Models

Doubly sparse regression gets a boost: this method avoids predictor duplication, saving compute, by projecting directly onto the intersection of selected groups.

Jia Wei He, R. Ayesha Ali, Gerarda Darlington

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

2w ago

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

Training MoE models just got a whole lot faster: Piper achieves up to 3.5x higher MFU by intelligently scheduling pipeline parallelism and optimizing communication.

Sajal Dash, Feiyi Wang

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Changsha University of Science and Technology2w ago

The Impossibility Triangle of Long-Context Modeling

Long-context models face a provable "impossibility triangle": you can't have efficiency, compactness, and unbounded recall *at the same time*.

Yan Zhou

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Lancaster University2w ago

Hypergraph Generation via Structured Stochastic Diffusion

Forget trying to shoehorn hypergraphs into pairwise representations – this diffusion model directly generates them from incidence matrices, unlocking more realistic and complex structures.

Christopher Nemeth

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

CMU ML2w ago

Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning

Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.

Shawn Ray

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

NUS2w ago·also SJTU

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Antonin Berthon +22w ago

Skill Neologisms: Towards Skill-based Continual Learning

Forget fine-tuning: "skill neologisms"—new soft tokens—let you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.

Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Khaled Ahmed +12w ago

DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion

Inverting time-domain marine electromagnetic data, a traditionally computationally intensive task, can now be done 21,000x faster with a deep learning model that also outperforms traditional optimization methods.

Khaled Ahmed, Ghada Omar

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Kyungwon Jeong +22w ago

Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking

Geometric continuity in deep networks isn't just a byproduct of depth, but an actively sculpted property arising from the interplay of residual connections and symmetry-breaking activations.

Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Training Efficiency & Optimization

ETH2w ago·also ELLIS, Max Planck

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Granular Mixture-of-Experts can now be efficient: AIR-MoE's two-stage routing slashes routing costs without sacrificing performance.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Xuan Qi +52w ago

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

Batch normalization's power comes from reshaping the geometry of neural network decision boundaries on a per-batch basis, not just from optimization benefits.

Xuan Qi, Yi Wei, Fanqi Yu +3

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Nicolás Valenzuela +22w ago

Neural Discovery of Strichartz Extremizers

Neural networks can now discover previously unknown behavior in hard PDE problems, revealing that Strichartz extremizers for the critical Airy equation are not attained but approached by mKdV breathers.

Nicolás Valenzuela, Ricardo Freire, Claudio Muñoz

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

2w ago

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

LLMs can be efficiently post-trained by only updating half the parameters, slashing memory costs without sacrificing performance.

Hengyu Shi, Peizhe Wang, Zhiling Wang +1

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

University of Würzburg2w ago·also Computer Vision Lab

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.

Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Albert F. Modenbach2w ago

A geometric relation of the error introduced by sampling a language model's output distribution to its internal state

Token embedding geometry isn't just abstract math—it directly mirrors how language models internally represent and reason about the world, as shown by its alignment with board state and piece importance in chess.

Albert F. Modenbach

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp World Models & Planning

Dominik Dahlem +22w ago

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Symmetric spectral analysis of attention is fundamentally blind to information flow direction, but a simple asymmetry coefficient can restore the signal.

Dominik Dahlem, Diego Maniloff, Mac Misiura

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Moshe Eliasof +42w ago

Bridging Input Feature Spaces Towards Graph Foundation Models

Graph models can now generalize to entirely new datasets with different input features, thanks to a simple projection into a shared random space.

Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua +2

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

2w ago

Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations

Neural operators can stably and accurately correct the structured truncation errors of classical numerical solvers for dispersive PDEs, even with rough data.

Zhangyong Liang

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

Yizheng Wang +42w ago

Replay-Based Continual Learning for Physics-Informed Neural Operators

Physics-informed neural operators can now learn continually without forgetting, thanks to a simple replay strategy that preserves past knowledge while rapidly adapting to new out-of-distribution data.

Yizheng Wang, M. Eshaghi, Xiaoying Zhuang +2

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Training Efficiency & Optimization

Yifan F. Zhang +42w ago

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

Diffusion models' reliance on global information isn't just a quirk – it's fundamentally linked to the moment they commit to a specific semantic outcome.

Yifan F. Zhang, Fangjun Hu, Guangkuo Liu +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Zhenchao Sun +32w ago

Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs

By explicitly modeling literal polarity in SAT formulas, GNNs can more accurately predict unsatisfiable cores.

Zhenchao Sun, Shuai Ma, Ping Lu +1

Architecture Design (Transformers, SSMs, MoE)

Omkar B. Shende +22w ago

AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures

Approximate computing can break MoEs in unexpected ways, with dense networks sometimes proving more robust, but careful retraining can unlock surprising efficiency gains in specific architectures.

Omkar B. Shende, Marcello Traiola, Gayathri Ananthanarayanan

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Matan Pagi +12w ago

Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

Control-dependent latent dynamics, achieved with a surprisingly small parameter increase, unlock robust MPC performance in time-varying environments where standard Koopman methods falter.

Matan Pagi, Zohar Sorek

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Kang Liu +22w ago

Exact Dual Geometry of SOC-ICNN Value Functions

Unlock white-box inference for SOC-ICNNs by directly reading out geometric primitives like Hessians from the optimal dual variables, bypassing black-box differentiation.

Kang Liu, Jianchen Hu, Wei Peng

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

V. Srinivasan +32w ago

Gyan: An Explainable Neuro-Symbolic Language Model

Forget opaque transformers: Gyan offers SOTA language modeling with full interpretability, lower compute, and human-like compositional understanding.

V. Srinivasan, Vishaal Jatav, A. Chandrababu +1

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Shereen Elsayed +32w ago

Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation

Ditch the attention: ConvRec proves convolutional networks can beat Transformers in sequential recommendation while slashing compute and memory costs.

Shereen Elsayed, N. Le, Ahmed Rashed +1

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Training Efficiency & Optimization

Lirui Luo +42w ago

SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.

Lirui Luo, Guoxi Zhang, Hongming Xu +2

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Training Efficiency & Optimization

NUS2w ago

Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

By embedding whole-slide images in a hybrid hyperbolic-Euclidean space, BatMIL unlocks superior classification performance compared to traditional Euclidean-only methods, revealing the importance of geometric awareness in capturing complex tissue organization.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Warsaw University of Technology2w ago·also Harvard, Massachusetts General Hospital, Warsaw

Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models

Hallucinations in diffusion models aren't just mode interpolation gone wrong, but instabilities on the model's manifold, and squashing its local intrinsic dimension can fix them.

Bartlomiej Sobieski, Matthew Tivnan, Dawid Płudowski +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision

2w ago

Architectural Constraints Alignment in AI-assisted, Platform-based Service Development

Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.

Julius Irion, Moritz Leugers, Paul Hartwig +5

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Tool Use & Agents

Anju Rani +22w ago

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

A single vision-language foundation model, DART, can perform a full rope inspection workflow, including damage classification, severity estimation, and few-shot recognition, all without task-specific fine-tuning.

Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

2w ago·also Shanghai Qizhi Institute, State Key Laboratory of Cryptology

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Shuffling activations, a popular defense in secure Transformer inference, crumbles under a new alignment attack that recovers model weights for just $1.

Zhengyi Li, Yakai Wang, Kang Yang +6

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Joshua Adler +12w ago

Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

Ditch the vector DB – this new agent architecture achieves SOTA memory recall by storing everything verbatim and optimizing retrieval, all in a single SQLite file.

Joshua Adler, Guy Zehavi

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Tool Use & Agents

Lena Ehrmuth +12w ago

Average Attention Transformers and Arithmetic Circuits

Transformers with average attention can natively execute arithmetic circuits, suggesting a new architectural direction for reasoning and computation.

Lena Ehrmuth, Laura Strieker

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought

Yukun Chen +42w ago

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models

Unlock scalable, high-quality singing voice synthesis by directly generating structured musical scores from audio, outperforming existing systems on multiple datasets.

Yukun Chen, Tianrui Wang, Zhaoxi Mu +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Xinyi Li +72w ago

HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

HeterSEED achieves state-of-the-art performance on heterophilic heterogeneous graphs by decoupling semantic and structural information, offering a more robust approach than relying on feature similarity alone.

Xinyi Li, Ming Li, Lu Bai +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Jingtao Zhou +32w ago

SpecPL: Disentangling Spectral Granularity for Prompt Learning

Freezing your VAE and permuting high-frequency visual signals unlocks a new SOTA for VLM prompt learning, boosting harmonic-mean accuracy to 81.51%.

Jingtao Zhou, Xirui Kang, Feiyang Huang +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Department of Data Science Institut2w ago·also Institut Teknologi Sumatera Lampung

A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter

CNN-BiLSTM beats AutoML for Indonesian hate speech detection, but the gains are modest, suggesting the dataset's limitations are a bigger bottleneck than model architecture.

Tanty Widiyastuti, Mayada, Adisty Syawalda Ariyanto +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

M. Arabov2w ago

Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus

Even state-of-the-art multilingual models struggle to tag parts-of-speech in Tajik when trained on isolated words, highlighting the critical role of syntactic context.

M. Arabov

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Natural Language Processing

Yepeng Weng +22w ago

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

UniVer achieves state-of-the-art speculative decoding by jointly optimizing multi-step and multi-draft verification, outperforming existing methods by up to 8.5% in acceptance length.

Yepeng Weng, Qiao Hu, T. Yairi

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Ziqi Zhu +32w ago

GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking

LLMs get schooled in dialogue state tracking by a mixture-of-experts architecture that uses a graph neural network and ReAct agents to achieve state-of-the-art results with a T5-Small backbone.

Ziqi Zhu, Adithya Suresh, Tomal Deb +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Tool Use & Agents

J. Jung +32w ago

Fundamental Limitations of Post-Quantum Cryptographic Architectures

Lattice-based cryptography's reliance on injected noise for security is more akin to hiding secrets under a rug than truly erasing them, leaving them vulnerable to future quantum attacks.

J. Jung, Donghwa Ji, Mingyu Lee +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Nankai University2w ago·also NTU

Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs

Remotely hosted Mixture-of-Experts LLMs are vulnerable to input-only attacks that hijack their routing mechanisms, forcing them to generate harmful content.

Zekun Fei, Zihao Wang, Weijie Liu +4

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness

2w ago·also Queen's

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

A clever routing strategy lets a tiny 3B code model outperform a massive 480B model on routine code completion tasks, slashing accelerator usage by 58%.

Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Inference & Quantization

Barkhausen Institut2w ago

Interaction Tree Semantics for RISC-V: Bridging Compiler and Hardware Verification

Proving semantic equivalence between LLVM IR and RISC-V code is now possible within a single framework, thanks to a new formal RISC-V semantics built on Interaction Trees.

Shuanglong Kan, Sebastian Ertel

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Distributed Systems & Hardware

2w ago

CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization

Forget dataset-specific hacks: CPCANet achieves SOTA domain generalization by explicitly learning a structured, domain-invariant subspace with a differentiable CPCA layer.

Yu-Hsi Chen, Abd-Krim Seghouane

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Honghu Pan +42w ago

Computer-Aided Design Generation by Cascaded Discrete Diffusion Model

Discrete diffusion, with carefully designed transition matrices for commands and parameters, unlocks superior CAD generation compared to continuous diffusion baselines.

Honghu Pan, Xiaoling Luo, Yongyong Chen +2

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Computer Vision

Sapna Sachan +12w ago

3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance

Generate CT-like images from ultrasound with a transformer-augmented network, potentially reducing the need for harmful radiation exposure.

Sapna Sachan, Amulya Kumar Mahto

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Kunyu Li +42w ago

GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution

Overlooked diagonal epipolar geometry holds the key to boosting light field super-resolution, as demonstrated by a new omnidirectional EPI Transformer.

Kunyu Li, Fei Wang, Lichao Zhang +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Huimin Wang +92w ago

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

RL fine-tuning unlocks a 6x performance gain for in-place trajectory editing in autonomous driving, demonstrating the power of aligning diffusion planners with reinforcement learning.

Huimin Wang, Yue Wang, Bihao Cui +7

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI World Models & Planning

Jacob Wahlgren +42w ago

Communication Offloading on SmartNIC DPUs: A Quantitative Approach

Offloading communication to SmartNIC DPUs can speed up host-dominated workloads by 1.55x, but the lack of Direct Cache Access creates a massive DRAM bottleneck.

Jacob Wahlgren, Andong Hu, Roger Pearce +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

2w ago·also CMU ML

AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC

Implicit time integration on GPUs gets a 3x speed boost thanks to a novel algebraic coarsening method that avoids costly explicit remeshing.

Xuan Wang, Zhaofeng Luo, Minchen Li +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Colorado State University2w ago

MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding

Run billions of bitwise operations directly in your 3D NAND flash, error-free, using just standard instructions.

Habib Ur Rahman, Tharini Suresh, Sudeep Pasricha +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

2w ago

Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster

Exponent bits are the Achilles' heel of floating-point arithmetic, as corrupting them in RISC-V vector processors leads to the most severe silent data corruption.

M. Cai, Amirhossein Kiamarzi, Davide Rossi +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Open-Source Models & Weights

M. Zaeemi +12w ago

Ultra Low-Power SDM-based Circuit-Switching for Networks-on-Chip

Radically reduce power consumption in AI chips with a circuit-switched network-on-chip that carves out dedicated "lanes" for predictable communication flows.

M. Zaeemi, Mehdi Modarressi

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Hanum Ko +32w ago

RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs

RangeGuard lets you tolerate 64+ flipped bits in DNN memory using just 16 bits of parity, without sacrificing accuracy.

Hanum Ko, Sang Yeon, Jong Hwan Ko +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Pablo del Mazo-Sevillano +32w ago

Multistate Coupled Diabatic Neural Network potential for the quantum non-adiabatic Photofragmentation of CH$_2^+$

Automating diabatization with neural networks unlocks accurate simulation of complex non-adiabatic molecular dynamics, revealing unexpected fragmentation pathways.

Pablo del Mazo-Sevillano, S. Gómez‐Carrasco, A. Aguado +1

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Rajeshwar Tripathi +32w ago

Hearing the Ocean: Bio-inspired Gammatone-CNN framework for Robust Underwater Acoustic Target Classification

Bio-inspired signal processing lets you hear subtle underwater sounds better than ever, achieving 98.41% accuracy in classifying targets even in noisy conditions.

Rajeshwar Tripathi, Sandeep Kumar, Monika Aggarwal +1

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Dongheon Lee +62w ago

Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement

Unlock near-oracle speech enhancement performance from compact microphone arrays by virtually expanding their spatial coverage with a novel neural network.

Dongheon Lee, Ashutosh Pandey, Sanjeel Parekh +4

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Wenzhuo Cheng +62w ago

CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

Generative recommendation gets a boost: CapsID's soft-routed semantic IDs outperform hard-quantized baselines and even rival sparse-dense hybrids, all while slashing inference latency by nearly half.

Wenzhuo Cheng, Menghang Gong, Qixin Guo +4

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Gaolin Ge +52w ago

3D Printing of Passively Actuated Self-Folding Robots with Integrated Functional Modules

Forget complex assembly: this 3D printing technique lets you pop out functional, self-folding robots with integrated sensors and actuators directly from a flat sheet.

Gaolin Ge, Qifeng Yang, Haoran Lu +3

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Robotics & Embodied AI

2w ago·also BIT, XJTU

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

By grounding temporal Gaussian aggregation in spatial voxels, Ground4D achieves state-of-the-art 4D reconstruction in challenging off-road environments where existing methods falter.

Shuo Wang, Jilin Mei, Fuyang Liu +6

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Nandkishore Mishra +42w ago

DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI

Brain tumor segmentation gets a lightweight boost: DALight-3D achieves comparable accuracy to larger U-Nets with significantly fewer parameters.

Nandkishore Mishra, Nand Kumar Mishra, Dhruv Mishra +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Anagh Malik +52w ago

Velox: Learning Representations of 4D Geometry and Appearance

Unlock efficient 4D object understanding from dynamic point clouds with Velox, a representation that's descriptive, compressive, and accessible.

Anagh Malik, Dorian Chan, Xiaoming Zhao +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Kaili Zheng +42w ago

InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery

Explicitly modeling human-object interactions boosts multi-person human mesh recovery accuracy by up to 9.9%, showing that interaction context is key to understanding human pose and shape in complex scenes.

Kaili Zheng, Kaiwen Wang, Xun Zhu +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

2w ago

SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression

Mamba's linear complexity meets perceptual image compression, yielding a lightweight model that rivals GANs and diffusion models in visual quality while being far more efficient.

Jiaqian Zhang, Hao Wei, Chenyang Ge +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Huan Zhang +62w ago

UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection

Generating synthetic training data with multi-modal diffusion beats hand-crafting better detection architectures for PCB defect inspection.

Huan Zhang, Lianghong Tan, Yichu Xu +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Data Curation & Synthetic Data

University of Campinas2w ago

Attention-Based Chaotic Self-Supervision for Medical Image Classification

Random masking in self-supervised learning can destroy crucial diagnostic features in medical images; instead, try inverting chaos.

Joao Batista Florindo, Amanda Pontes de Oliveira Ornelas

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Keunho Byeon +12w ago

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

Spatial transcriptomics predictions get a boost from HEXST, a Transformer that respects the hexagonal geometry of spot arrays and recovers gene-specific spatial heterogeneity.

Keunho Byeon, Jin Tae Kwak

Architecture Design (Transformers, SSMs, MoE)Computer Vision Scientific Discovery & Drug Design

Xinze Li +62w ago

QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes

3D Gaussian Splatting gets a nearly 2x speed boost thanks to a clever bounding box strategy that drastically reduces unnecessary tile intersection checks.

Xinze Li, Bohan Yang, Pengxu Chen +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Joao B Florindo2w ago

Chaotic Contrastive Learning for Robust Texture Classification

Forget ImageNet – pre-training with chaotic augmentations yields surprisingly robust texture features, outperforming SOTA methods across diverse texture datasets.

Joao B Florindo

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

May 5, 2026

Lin Song +182w ago

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

Bidirectional interaction between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables a unified multimodal model to achieve spatial intelligence beyond general visual competence.

Lin Song, Wenbo Li, Guoqing Ma +16

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Mohammed Sabry +12w ago

Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference

Get 4x faster LLM inference with Budgeted LoRA, which smartly redistributes compute between dense and low-rank pathways during distillation, outperforming standard LoRA in both speed and function-style in-context learning.

Mohammed Sabry, Anya Belz

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Yaobo Zhang2w ago

Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks

Forget boring rotary embeddings: Jordan-RoPE unlocks distance-modulated phase interactions in attention, letting your model learn relationships like "the further apart, the stronger the cosine similarity."

Yaobo Zhang

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Oona Itkonen +12w ago

The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation

Domain match and language relatedness trump joint vocabularies for effective knowledge transfer in multilingual NMT.

Oona Itkonen, Jörg Tiedemann

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Skye Gunasekaran +32w ago

Transformers with Selective Access to Early Representations

SATFormer shows that selectively gating access to early-layer representations boosts Transformer performance, especially in retrieval tasks, without sacrificing efficiency.

Skye Gunasekaran, Téa Y. Wright, Rui-Jie Zhu +1

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

2w ago·also UChicago

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

Transformers generalize out-of-distribution not by clever interpolation, but by learning a separate, orthogonal representation subspace for unseen tasks.

Hao Yan, Haolin Yang, Yiqiao Zhong

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Jiachen Shen +32w ago

Will the Carbon Border Adjustment Mechanism Impact European Electricity Prices? A GNN-Based Network Analysis

CBAM could reshape Europe's electricity market, giving low-carbon countries a competitive edge while burdening high-carbon economies.

Jiachen Shen, Jian Shi, Dan Wang +1

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

Chun Yin Chiu2w ago

Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data Sharing

Forget trusted online policy enforcement points: this revocation-ready key management layer uses ciphertext key publication to enforce dynamic, multi-user authorization for releasing or using bulk-data decryption keys in blockchain-based IoT data sharing systems.

Chun Yin Chiu

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Erfan Iravani +62w ago

LIPPEN: A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity

Get strong pointer integrity and confidentiality without metadata overhead: LIPPEN encrypts pointers in-place, turning every pointer into a cryptographically protected block.

Erfan Iravani, Lalit Prasad Peri, Mohannad Ismail +4

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

2w ago

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Provably undetectable backdoors can be injected into pre-trained image classifiers, even with white-box access, by exploiting sparse perturbations and Gaussian dithering.

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision Red-Teaming & Adversarial Robustness

Melki Bino2w ago

Probabilistic-bit Guided CDCL for SAT Solving using Ising Consensus Assumptions

Stochastic sampling from p-bit Ising models can slash the search effort of CDCL SAT solvers by over 80% on certain problem instances.

Melki Bino

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Seyed Erfan Fatemieh +22w ago

Design of Memristive Lightweight Encryption For In-Memory Image Steganography

Computation-in-memory combined with lightweight cryptography slashes energy consumption by up to 44% in steganography applications.

Seyed Erfan Fatemieh, Reza Alizadeh, E. Zarezadeh

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

2w ago

Covariance-Aware Goodness for Scalable Forward-Forward Learning

Forward-Forward learning can finally compete with backpropagation on complex image tasks, thanks to a novel covariance-aware goodness function that captures crucial second-order feature dependencies.

Xiaoyi Jiang, Bashir M. Al-Hashimi, Kai Xu

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

James Yen +72w ago

Jiao: Bridging Isolation and Customization in Mixed Criticality Robotics

Achieve near order-of-magnitude reduction in tail timing error in mixed-criticality robotics by decoupling safety-critical control from user applications.

James Yen, Zhibai Huang, Zhixiang Wei +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Robotics & Embodied AI

Reza Farahani +62w ago

ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters

ClusterLess slashes workflow completion times by up to 40% and nearly doubles deadline satisfaction in federated edge environments, outperforming existing methods.

Reza Farahani, M. Colosi, Ilir Murturi +4

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

OpenAI2w ago

Resilient AI Supercomputer Networking using MRC and SRv6

AI training jobs can now shrug off network failures that used to halt progress, thanks to a new resilient networking stack deployed at OpenAI and Microsoft.

Joao Araujo, Alex Chow, Mark Handley +150

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

2w ago·also TU Wien

Orchestrating Serverless Applications in the Edge Cloud Space Continuum: What Breaks and What is Next?

Serverless orchestration falls apart when you move it to space, but this paper proposes a new architecture to fix it.

H. Malazi, Reza Farahani, Nitinder Mohan +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Nick Brown +12w ago

Lifting to tensors when compiling scientific computing workloads for AI Engines

Get up to 40% performance boost and 15% energy savings on scientific computing kernels by offloading OpenMP loops to AMD's AI Engines with minimal code changes.

Nick Brown, Gabriel Rodriguez-Canal

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Scientific Discovery & Drug Design

Aaron Jarmusch +12w ago

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

Forget simplistic roofline models: these analytical models nail GPU performance prediction on Blackwell and CDNA3 with under 1.5% error.

Aaron Jarmusch, Sunita Chandrasekaran

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization