Jun 22, 2026arXiv:2606.23637

Muown Implicitly Performs Angular Step-size Decay

Florian Hübler, Kai Lion, Antonio Orvieto, Niao He

AI Summary

This paper analyzes the Muown optimizer, revealing that its directional update corresponds to a Riemannian step on normalized directions, while the magnitude modulates the angular step size. The authors introduce AngularMuown, which optimizes directly over normalized directions and employs a schedulable angular multiplier, leading to improved performance over Muown. Notably, AngularMuown has achieved top results in the modded nanoGPT speedrunning competition and demonstrates scalability in larger mixture-of-experts models.

Key Contribution

AngularMuown not only enhances optimization stability but also outperforms its predecessor in competitive benchmarks, redefining expectations for matrix-aware optimizers.

Abstract

Matrix-aware optimizers such as Muon and Muown have recently shown strong empirical performance for pre-training Transformers. In particular, Muown separates each weight matrix into row magnitudes and an un-normalized direction variable, updating the former with Adam and the latter with Muon. We show that the directional update of Muown is equivalent to a Riemannian step on the normalized directions, while the magnitude of the un-normalized parameterization only modulates the angular step size. This explains the step-size stability of Muown and suggests making the angular step size explicit. The resulting method, AngularMuown, optimizes directly over the normalized directions and uses a schedulable angular multiplier decoupled from the radial magnitude update. AngularMuown improves over Muown and, at the time of writing, a preliminary version is leading the per-optimizer category of the modded nanoGPT speedrunning competition. Further experiments on Qwen2-0.5B, and 1.1B parameter mixture-of-experts models confirm the algorithm scales beyond small models. An implementation of the algorithm is available at https://github.com/fhueb/angular-muown

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Muown Implicitly Performs Angular Step-size Decay

Related Papers