Joint EURECOM-Lettria-Orange submission workspace for the EvalLLM 2026 RAG Challenge.
challenge/ Challenge output schemas, sample queries, and validation.
data/ Local datasets (git-ignored).
experiments/ Individual experiments.
runs/ Generated outputs, logs, and notes (git-ignored).
submissions/ Final JSON files for submission.
tools/ Shared helper utilities.
Expected local data layout:
ELO-GRAG/
data/
Corpus_raw/
Corpus_parsed/
Download the dataset archives manually:
Corpus_raw.tar.gz: https://nextcloud.eurecom.fr/s/4t4bKTjqo6FiS7oCorpus_parsed.tar.gz: https://nextcloud.eurecom.fr/s/aKDHteYdLwHBSqb
Place both archives in data/, then extract them:
mkdir -p data
tar -xzf data/Corpus_raw.tar.gz -C data
tar -xzf data/Corpus_parsed.tar.gz -C dataCorpus_raw/contains the original PDFs.Corpus_parsed/contains one subfolder per document, with the output fromdocparsing(JSON, MD, XML formats) based on the original PDFs, along with asource.txtfile containing the original PDF filename for reference.