Search papers, labs, and topics across Lattice.
The paper introduces CitiLink-Summ, a new corpus of 100 European Portuguese municipal meeting minutes with 2,322 manually created summaries for distinct discussion subjects. This dataset addresses the scarcity of resources for summarization research in low-resource languages and complex administrative texts. The authors establish baseline summarization results using state-of-the-art generative models like BART and PRIMERA, as well as large language models, evaluated using ROUGE, BLEU, METEOR, and BERTScore.
A new dataset of European Portuguese municipal meeting minutes with high-quality, manually crafted summaries finally enables research into summarizing complex administrative texts in a low-resource language.
Municipal meeting minutes are formal records documenting the discussions and decisions of local government, yet their content is often lengthy, dense, and difficult for citizens to navigate. Automatic summarization can help address this challenge by producing concise summaries for each discussion subject. Despite its potential, research on summarizing discussion subjects in municipal meeting minutes remains largely unexplored, especially in low-resource languages, where the inherent complexity of these documents adds further challenges. A major bottleneck is the scarcity of datasets containing high-quality, manually crafted summaries, which limits the development and evaluation of effective summarization models for this domain. In this paper, we present CitiLink-Summ, a new corpus of European Portuguese municipal meeting minutes, comprising 100 documents and 2,322 manually hand-written summaries, each corresponding to a distinct discussion subject. Leveraging this dataset, we establish baseline results for automatic summarization in this domain, employing state-of-the-art generative models (e.g., BART, PRIMERA) as well as large language models (LLMs), evaluated with both lexical and semantic metrics such as ROUGE, BLEU, METEOR, and BERTScore. CitiLink-Summ provides the first benchmark for municipal-domain summarization in European Portuguese, offering a valuable resource for advancing NLP research on complex administrative texts.