Search papers, labs, and topics across Lattice.
This paper introduces SBARThez, a modified BART-based French model for abstractive summarization that leverages multimodal and multilingual sentence embeddings from LaBSE, SONAR, and BGE-M3. To mitigate hallucinations, they incorporate a Named Entity Injection mechanism, appending tokenized named entities to the decoder input. Experiments demonstrate competitive performance, particularly for low-resource languages, with the model generating more concise and abstract summaries compared to token-level baselines.
Hallucination in abstractive summarization? Injecting named entities into the decoder input, along with multimodal embeddings, can keep your French BART model grounded.
Abstractive summarization aims to generate concise summaries by creating new sentences, allowing for flexible rephrasing. However, this approach can be vulnerable to inaccuracies, particularly `hallucinations'where the model introduces non-existent information. In this paper, we leverage the use of multimodal and multilingual sentence embeddings derived from pretrained models such as LaBSE, SONAR, and BGE-M3, and feed them into a modified BART-based French model. A Named Entity Injection mechanism that appends tokenized named entities to the decoder input is introduced, in order to improve the factual consistency of the generated summary. Our novel framework, SBARThez, is applicable to both text and speech inputs and supports cross-lingual summarization; it shows competitive performance relative to token-level baselines, especially for low-resource languages, while generating more concise and abstract summaries.