U PortoUBIFeb 18, 2026arXiv:2602.16607

CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

Miguel Marques, Ana Luísa Fernandes, Ana Lu'isa Fernandes, Ana Filipa Pacheco, Ana Filipa Pacheco, Rute Rebouccas, Rute Rebouças, Inês Cantante, Inês Cantante, José Isidro, José Isidro, L. F. Cunha, Luís Filipe Cunha, A. Jorge, Alípio Jorge, Nuno Guimarães, Nuno Guimarães, Sérgio Nunes, Sérgio Nunes, Ant´onio Leal, António Leal, Purificaccao Silvano, Purificação Silvano, Ricardo Campos, Ricardo Campos

AI Summary

The paper introduces CitiLink-Summ, a new corpus of 100 European Portuguese municipal meeting minutes with 2,322 manually created summaries for distinct discussion subjects. This dataset addresses the scarcity of resources for summarization research in low-resource languages and complex administrative texts. The authors establish baseline summarization results using state-of-the-art generative models like BART and PRIMERA, as well as large language models, evaluated using ROUGE, BLEU, METEOR, and BERTScore.

Key Contribution

A new dataset of European Portuguese municipal meeting minutes with high-quality, manually crafted summaries finally enables research into summarizing complex administrative texts in a low-resource language.

Abstract

Municipal meeting minutes are formal records documenting the discussions and decisions of local government, yet their content is often lengthy, dense, and difficult for citizens to navigate. Automatic summarization can help address this challenge by producing concise summaries for each discussion subject. Despite its potential, research on summarizing discussion subjects in municipal meeting minutes remains largely unexplored, especially in low-resource languages, where the inherent complexity of these documents adds further challenges. A major bottleneck is the scarcity of datasets containing high-quality, manually crafted summaries, which limits the development and evaluation of effective summarization models for this domain. In this paper, we present CitiLink-Summ, a new corpus of European Portuguese municipal meeting minutes, comprising 100 documents and 2,322 manually hand-written summaries, each corresponding to a distinct discussion subject. Leveraging this dataset, we establish baseline results for automatic summarization in this domain, employing state-of-the-art generative models (e.g., BART, PRIMERA) as well as large language models (LLMs), evaluated with both lexical and semantic metrics such as ROUGE, BLEU, METEOR, and BERTScore. CitiLink-Summ provides the first benchmark for municipal-domain summarization in European Portuguese, offering a valuable resource for advancing NLP research on complex administrative texts.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

Related Papers