Search papers, labs, and topics across Lattice.
The authors introduce NCTB-QA, a new large-scale Bangla question answering dataset with a balanced distribution of answerable and unanswerable questions extracted from 50 textbooks. The dataset includes adversarially designed instances with plausible distractors to challenge reading comprehension systems. Benchmarking experiments with BERT, RoBERTa, and ELECTRA show that fine-tuning on NCTB-QA leads to substantial improvements in F1 score and BERTScore, highlighting the importance of domain-specific fine-tuning in low-resource settings.
A new Bangla QA dataset with a high proportion of unanswerable questions exposes the fragility of current models in low-resource settings.
Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality measured by BERTScore also increases significantly across all models. Our results establish NCTB-QA as a challenging benchmark for Bangla educational question answering. This study demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.