Search papers, labs, and topics across Lattice.
The paper introduces DEBISS, a novel corpus of individual, spoken debates in Portuguese, designed to address the scarcity of debate datasets, especially in languages other than English. The corpus features semi-structured debates, making it suitable for a variety of NLP tasks. Initial experiments suggest its utility for speech-to-text, speaker diarization, argument mining, and debate quality assessment.
A new Portuguese-language debate corpus fills a critical gap, enabling NLP research on spoken argumentation beyond English.
Debating is essential in daily life — whether in academic or professional settings, casual conversations, political forums, or online discussions. The range of debate applications is broad; therefore, their structures and formats can vary significantly. Developing corpora that account for these variations is challenging. The scarcity of debate corpora in the current state of the art, particularly for other languages beyond English, is notable. For this reason, this research proposes the DEBISS corpus, a collection of spoken and individual debates in Portuguese with semi-structured features. The corpus has broad applicability across Natural Language Processing tasks, including speech-to-text, speaker diarization, argument mining, and debate quality evaluation.