Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike
-
We present InQA+ and GenIQA.
-
InQA+ is a multi-lingual extension of IndirectQA in English, Standard German and Bavarian. It consists of indirect question-answer pairs from parallel movie scripts in English and German from opensubtitles v2018 (Lison and Tiedemann, 2016) and hand-translated Bavarian sentences.
-
GenIQA is a multi-lingual artificial dataset consisting of LLM-generated indirect question-answer pairs.
-
We train and evaluate multi-lingual transformer-based models (mBERT (Devlin et al., 2019), mDeBERTa (He et al., 2020) and XLM-R (Conneau et al., 2020)).
-
We find that the IQA performance is poor in high- (English, German) and low-resource languages (Bavarian) and that a large training data amount is important. Further, GPT-4o-mini does not possess enough pragmatic understanding to solve the task well in any of the three tested languages.
All subfolders containing data in data and predictions are in zip archives with the password MaiNLP so as to prevent potential inclusion in web-scraped datasets (cf. Jacovi et al., 2023).
-
code:datacontains dataset-related code, including the processing and filtering of the raw opensubtitles v2018 (Lison and Tiedemann, 2016) data and data perturbation.llmscontains LLM-related code, including the LLM testing and GenIQA generation code.train_predictcontains training- and classification-related code.
-
data: contains zipped data files and a README with shortened data statements. -
predictions: contains the predictions, evaluation report and confusion matrix for each experiment, sorted by model (mBERT, mDeBERTa and XLM-R).
If you use the data and/or code in this repository, please cite the following paper (to be published at LREC 2026):
@inproceedings{winkler-etal-2026-indirect,
title = "Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike",
author = "Winkler, Miriam and Blaschke, Verena and Plank, Barbara",
year = "2026",
booktitle = TODO,
publisher = TODO,
}
We thank the anonymous reviewers as well as the members of the MaiNLP research lab for their feedback, especially Rob van der Goot and Felicia Körner.
This work is supported by ERC Consolidator Grant DIALECT no. 101043235.
