Apr 2, 2026arXiv:2604.01762

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

Juyong Jiang, Juyong Jiang, Fan Wang, Fan Wang, Hong Qi, Hong Qi, Sunghun Kim, Sunghun Kim, Jing Tang, Jing Tang

AI Summary

The paper introduces FourierMoE, a parameter-efficient fine-tuning method for LLMs that operates in the spectral domain by integrating a mixture-of-experts architecture with the inverse discrete Fourier transform (IDFT). It leverages the observation that different tasks and LLM layers exhibit distinct frequency sensitivities, using a frequency-adaptive router to dispatch tokens to experts specialized in different frequency bands. Experiments across 28 benchmarks show that FourierMoE outperforms existing PEFT methods in both single-task and multi-task settings, while using fewer trainable parameters.

Key Contribution

LLMs can be fine-tuned more efficiently by adapting experts in the frequency domain, leading to better performance with fewer parameters.

Abstract

Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References73

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

Related Papers