Google Research

×Architecture Design (Transformers, SSMs, MoE)

12 papers from Google Research on Architecture Design (Transformers, SSMs, MoE)

Jun 4, 2026

Multi-ResNets for Subspace Preconditioning in Constrained Optimization

MResOpt achieves significantly lower high-priority constraint violations in constrained optimization tasks while remaining computationally efficient, revolutionizing how we approach complex optimization problems.

Merve Karakas, Christopher J. Williams, Emmanuel O. Balogun +3

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

May 28, 2026

Google ResearchMay 28, 2026·also Max Planck

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

PARCEL redefines visual tokenization, achieving superior efficiency and performance by dynamically anchoring feature extraction to spatial pool tokens.

S. Kuzucu, Alessio Tonioni, Vasile Lup +3

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Multimodal Models

May 27, 2026

CMU MLMay 27, 2026·also Google Research

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

Why pick just one token mixer when you can have them all, dynamically switching between attention and linear recurrences for optimal efficiency and performance?

Kevin Y. Li, Asher Trockman, Ananda Theertha Suresh +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

May 27, 2026·also DeepMind, Google Research, AMD Research and Advanced Development, Intel Labs

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.

Hanjiang Wu, Abhimanyu Rajeshkumar Bambhaniya, Sarbartha Banerjee +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

May 22, 2026

May 22, 2026·also Google Research, Vector

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

You can slash the compute cost of visual geometry transformers by 85% without sacrificing accuracy by intelligently pruning redundant tokens across frames and within layers.

Shuhong Zheng, Michael Oechsle, Erik Sandström +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

May 21, 2026

Google ResearchMay 21, 2026·also Courant Institute of Mathematical, Harvard, NYU, School of Engineering and Applied

Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers

Graph transformers can be fundamentally limited by their tokenization strategy, as some tokenizations provably preclude efficient learning of structural representations realizable with other tokenizations.

Maya Bechler-Speicher, Gilad Yehudai, Gil Harari +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Apr 20, 2026

Apr 20, 2026·also Google Research, MIT CSAIL

Enabling AI ASICs for Zero Knowledge Proof

ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.

Jianming Tong, Jingtian Dang, Jing Dang +8

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 5, 2026

Apr 5, 2026·also Google Research, Independent Researcher

NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

CGRA performance jumps by 2.7x thanks to NEURA, a compilation framework that elegantly transforms control flow into dataflow.

Shangkun Li, Jinming Ge, Diyuan Tao +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware

Mar 19, 2026

Google ResearchMar 19, 2026·also DeepMind

Seasoning Generative Models for a Generalization Aftertaste

Refining generative models with discriminator guidance provably improves generalization, offering a theoretical justification for techniques like score-based diffusion.

Hisham Husain, Valentin De Bortoli, Richard Nock

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Mar 9, 2026

Mar 9, 2026·also Google Research

Grow, Don't Overwrite: Fine-tuning Without Forgetting

Forget catastrophic forgetting: this function-preserving expansion method lets you fine-tune without sacrificing pre-trained knowledge, matching full fine-tuning performance at a fraction of the cost.

Dyah Adila, Hanna Mazzawi, Benoit Dherin +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Feb 27, 2026

Feb 27, 2026·also Google Research

Memory Caching: RNNs with Growing Memory

Recurrent models can now achieve Transformer-competitive performance on recall-intensive tasks, thanks to a simple memory caching mechanism that grows memory capacity with sequence length.

Ali Behrouz, Zeman Li, Yuan Deng +3

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Feb 17, 2026

Google ResearchFeb 17, 2026·also Northwestern

On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Randomly masking parameter updates in RMSProp delivers state-of-the-art LLM training performance, revealing a surprisingly effective form of geometric regularization.

Taejong Joo, Wenhan Xia, Cheolmin Kim

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Search

Google Research