Search papers, labs, and topics across Lattice.
The paper introduces Bidirectional Awareness Induction (BAI), a training method that enhances information retention in a designated "pivot" subset of network results by incorporating bidirectional loss terms. BAI improves performance across Transformer, ExpansionNet v2, Flan-T5-Small, GPT-2, and mBART architectures on tasks like Neural Machine Translation, Image Captioning, and Text Summarization. The method achieves improvements of 4.96 BLEU, 2.4 CIDEr-D and 1.16 ROUGE, without requiring architectural modifications and can be applied to pre-trained models.
Autoregressive sequence-to-sequence models get a boost: Bidirectional Awareness Induction (BAI) training improves performance across diverse architectures and NLP tasks without architectural changes.
Autoregressive Sequence-To-Sequence (Seq2Seq) models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. However, their limitations motivated researchers to explore different architectures and methodologies toward bidirectional solutions. In this work, we introduce the Bidirectional Awareness Induction (BAI), a flexible training method that enhances the information retained in a subset of the network results, which we call pivot, through bidirectional loss terms. Our method led to improvements across multiple architectures, such as the Transformer, ExpansionNet v2, Flan-T5-Small, GPT-2, mBART, and four NLP tasks, such as Neural Machine Translation, Image Captioning, and Text Summarization with an observed improvement of 4.96 BLEU, 2.4 CIDEr-D and 1.16 ROUGE respectively. Compared to existing methods, BAI does not require architectural modifications, it is flexible, efficient, and can be applied to pre-trained models.