Jul 4, 2025

A Survey of Transformer Optimization Techniques: Progress and Challenges from Computational Efficiency to Multimodal Fusion

AI Summary

This survey paper reviews recent advances in Transformer optimization techniques, addressing challenges related to computational efficiency, parameter-efficient fine-tuning, external knowledge integration, and multimodal fusion. It analyzes representative methods within each area, highlighting underlying principles and experimental results. The paper identifies key challenges and proposes future research directions, including unified efficient attention theories and trustworthy knowledge injection mechanisms.

Key Contribution

Transformer optimization is about more than just speed: it's also about injecting knowledge, adapting to new tasks, and fusing modalities, revealing a rich landscape of techniques beyond simple efficiency gains.

Abstract

Since its proposal in 2017, the Transformer model has achieved revolutionary breakthroughs in natural language processing and even in computer vision tasks. However, its huge number of parameters and high computational complexity have posed substantial difficulties in training and inference efficiency, model knowledge updating, and multimodal information fusion. This paper reviews recent research progress on Transformer optimization techniques, including: (1) Structural optimization and computational efficiency model architecture improvements, pruning compression, and efficient attention mechanisms to reduce computational cost; (2) Parameter-efficient fine-tuning and task adaptation new fine-tuning methods with high parameter efficiency, as well as few-shot and zero-shot learning paradigms, to improve adaptability in low-resource and multi-task scenarios; (3) External knowledge integration incorporating knowledge graphs, retrieval-based external memory, etc., into Transformers to fill knowledge gaps and enhance commonsense reasoning; (4) Multimodal fusion designing cross-modal Transformer architectures and alignment mechanisms to effectively fuse information from modalities such as vision and language. Analyze representative methods, underlying principles, and experimental results for each direction, discuss the main challenges, and predict forthcoming research directions, such as unified efficient attention theories, trustworthy knowledge injection mechanisms, green AI training strategies, and next-generation interpretable Transformer architectures.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2025

VenueApplied and Computational Engineering

Related Papers

Finding related papers...

Search

A Survey of Transformer Optimization Techniques: Progress and Challenges from Computational Efficiency to Multimodal Fusion

Related Papers