Mar 17, 2025

The Bidirectional Awareness Induction in Autoregressive Sequence-To-Sequence Models

AI Summary

The paper introduces Bidirectional Awareness Induction (BAI), a training method that enhances information retention in a designated "pivot" subset of network results by incorporating bidirectional loss terms. BAI improves performance across Transformer, ExpansionNet v2, Flan-T5-Small, GPT-2, and mBART architectures on tasks like Neural Machine Translation, Image Captioning, and Text Summarization. The method achieves improvements of 4.96 BLEU, 2.4 CIDEr-D and 1.16 ROUGE, without requiring architectural modifications and can be applied to pre-trained models.

Key Contribution

Autoregressive sequence-to-sequence models get a boost: Bidirectional Awareness Induction (BAI) training improves performance across diverse architectures and NLP tasks without architectural changes.

Abstract

Autoregressive Sequence-To-Sequence (Seq2Seq) models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. However, their limitations motivated researchers to explore different architectures and methodologies toward bidirectional solutions. In this work, we introduce the Bidirectional Awareness Induction (BAI), a flexible training method that enhances the information retained in a subset of the network results, which we call pivot, through bidirectional loss terms. Our method led to improvements across multiple architectures, such as the Transformer, ExpansionNet v2, Flan-T5-Small, GPT-2, mBART, and four NLP tasks, such as Neural Machine Translation, Image Captioning, and Text Summarization with an observed improvement of 4.96 BLEU, 2.4 CIDEr-D and 1.16 ROUGE respectively. Compared to existing methods, BAI does not require architectural modifications, it is flexible, efficient, and can be applied to pre-trained models.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2025

Venue31th International Conference on Neural Information Processing (ICONIP) abstracts

Related Papers

Finding related papers...

Search

The Bidirectional Awareness Induction in Autoregressive Sequence-To-Sequence Models

Related Papers