Mar 16, 2026arXiv:2603.15130

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Miriam Winkler, Verena Blaschke, Barbara Plank

AI Summary

This paper introduces two multilingual corpora, InQA+ (high-quality evaluation) and GenIQA (GPT-4o-mini generated training data), for the task of Indirect Question Answering (IQA) in English, German, and Bavarian. Experiments with multilingual transformers (mBERT, XLM-R, mDeBERTa) reveal that IQA is a challenging task even for high-resource languages, exhibiting low performance and overfitting. Analysis of factors like label ambiguity and dataset size suggests that large training datasets are beneficial, but GPT-4o-mini struggles to generate high-quality IQA data.

Key Contribution

Even state-of-the-art multilingual transformers struggle with the pragmatic challenge of Indirect Question Answering, achieving low performance across English, German, and Bavarian.

Abstract

Indirectness is a common feature of daily communication, yet is underexplored in NLP research for both low-resource as well as high-resource languages. Indirect Question Answering (IQA) aims at classifying the polarity of indirect answers. In this paper, we present two multilingual corpora for IQA of varying quality that both cover English, Standard German and Bavarian, a German dialect without standard orthography: InQA+, a small high-quality evaluation dataset with hand-annotated labels, and GenIQA, a larger training dataset, that contains artificial data generated by GPT-4o-mini. We find that IQA is a pragmatically hard task that comes with various challenges, based on several experiment variations with multilingual transformer models (mBERT, XLM-R and mDeBERTa). We suggest and employ recommendations to tackle these challenges. Our results reveal low performance, even for English, and severe overfitting. We analyse various factors that influence these results, including label ambiguity, label set and dataset size. We find that the IQA performance is poor in high- (English, German) and low-resource languages (Bavarian) and that it is beneficial to have a large amount of training data. Further, GPT-4o-mini does not possess enough pragmatic understanding to generate high-quality IQA data in any of our tested languages.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Related Papers