MPBoot_LLM generates R2RML mappings from relational database dumps and target ontologies with an LLM-driven pipeline. The repo also includes a local evaluation stack for RODI-style benchmarks and shared result webpages.
- Python 3.9+
- Java 11+
gitcurlunzip- Docker
- either
mvnor Docker access to themavenimage
All commands below assume you are at the repository root:
cd mpboot_llmBootstrap the local toolchain:
bash scripts/bootstrap.shThis installs or prepares:
.venv.tools/robot.tools/rodi.tools/ontop/jdbc.tools/bin/psql_docker.sh- the local PostgreSQL Docker container
Then create .env:
cp .env.example .envAt minimum, set the API key for the provider you want to use and set LLM_PROVIDER to select it.
Useful environment variables:
LLM_PROVIDER— provider to use (claude,gpt4o,gpt4o-mini,groq,gemini,ollama; default:claude)OLLAMA_BASE_URL— Ollama server URL (default:http://localhost:11434)OLLAMA_MODEL— model name to use with Ollama (default:deepseek-r1:32b)ANTHROPIC_API_KEYANTHROPIC_PROXY_URLANTHROPIC_MOCK_LOG_LEVELMPBOOT_DB_PORTMPBOOT_DB_NAMEMPBOOT_R2RML_FORCE_DOUBLE_FOR_DECIMALS
A worked example using the Conference NoFKs dataset is in tutorial/. It runs entirely locally — Ollama for the LLM and Docker for PostgreSQL — with no cloud API key required. It covers all steps from running the pipeline to the optional RODI evaluation.
If you already have a relational dump and an ontology, you do not need the RODI dataset download path. The mapping pipeline expects a dataset directory containing:
dump.sqlordump_pg_compatible.sqlontology.ttlorontology.owl- optionally
queries/*.qpairif you also want RODI-style evaluation later
Example:
my_input/
my_dataset/
dump.sql
ontology.ttl
bash scripts/create_pg_compatible_dataset.sh my_input pg_compatible/outputs/data_pg_compatibleThat will create:
pg_compatible/outputs/data_pg_compatible/my_dataset/
with:
dump_pg_compatible.sqlontology.ttlor copiedontology.owl- copied extra files such as
queries/
If you only want to process a single dataset directory directly:
bash scripts/create_pg_compatible_dataset.sh my_input/my_dataset pg_compatible/outputs/data_pg_compatible/my_datasetbash scripts/generate_owlxml_ontologies.sh pg_compatible/outputs/data_pg_compatible --dataset my_datasetIf ontology.owl already exists, this step can be skipped unless you want to overwrite it:
bash scripts/generate_owlxml_ontologies.sh pg_compatible/outputs/data_pg_compatible --dataset my_dataset --overwritebash scripts/create_mapping_single_dataset.sh --dataset-dir pg_compatible/outputs/data_pg_compatible/my_datasetThe runner will stage the dataset into the live workspace:
and then execute the mapping phases.
Useful variants:
bash scripts/create_mapping_single_dataset.sh --dataset-dir pg_compatible/outputs/data_pg_compatible/my_dataset --dry-run
bash scripts/create_mapping_single_dataset.sh --dataset-dir pg_compatible/outputs/data_pg_compatible/my_dataset --from phase1
bash scripts/create_mapping_single_dataset.sh --dataset-dir pg_compatible/outputs/data_pg_compatible/my_dataset --only phase7Evaluation is optional. If you also provide queries/*.qpair, you can evaluate an archived run later with:
bash scripts/evaluation.sh outputs/<model>/<timestamp> --dataset my_dataset --method allIf you have no qpair queries, the mapping-generation workflow still works; only the RODI query evaluation path is unavailable.
This is the workflow for the bundled RODI benchmark datasets under datasets/rodi/.
Run everything for one dataset:
bash scripts/run_end_to_end_dataset.sh mondial_rel --method allVariants:
bash scripts/run_end_to_end_dataset.sh mondial_rel --method rodi
bash scripts/run_end_to_end_dataset.sh mondial_rel --skip-evaluation --skip-summaryThis wrapper will:
- bootstrap missing tools
- download the requested RODI dataset if missing
- normalize
mondial_relif needed - build the PostgreSQL-compatible dataset copy
- generate
ontology.owlif needed - start the local Anthropic cache server
- run mapping generation
- stop the cache server
- run evaluation
- regenerate the shared summary webpages
Run the full batch:
bash scripts/run_end_to_end_all.shVariants:
bash scripts/run_end_to_end_all.sh --method rodi
bash scripts/run_end_to_end_all.sh --skip-evaluation --skip-summaryIf you want the individual steps instead of the wrapper:
- Download the selected benchmark datasets:
bash scripts/bootstrap.sh --download-rodiDownload only one dataset:
bash scripts/bootstrap.sh --download-rodi --dataset mondial_rel- Build PostgreSQL-compatible dataset copies:
bash scripts/create_pg_compatible_dataset.sh datasets/rodi- Generate OWL/XML where needed:
bash scripts/generate_owlxml_ontologies.sh pg_compatible/outputs/data_pg_compatible- Run mapping generation:
bash scripts/create_all_mapping.sh pg_compatible/outputs/data_pg_compatible --keep-goingOr one dataset only:
bash scripts/create_mapping_single_dataset.sh --dataset-dir pg_compatible/outputs/data_pg_compatible/mondial_rel- Evaluate an archived batch:
bash scripts/evaluation.sh outputs/<model>/<timestamp> --method all --keep-goingOnly one dataset:
bash scripts/evaluation.sh outputs/<model>/<timestamp> --dataset mondial_rel --method all- Regenerate the shared webpages:
bash scripts/generate_summary_portal.shYou can still pass an archived batch path if you want to anchor discovery to one run:
bash scripts/generate_summary_portal.sh outputs/<model>/<timestamp>mondial_rel needs one repo-specific normalization step. During bootstrap preparation, the repo:
- keeps only the relevant schema from the original dump
- renames the schema to
mondial_rel - strips obsolete schema prefixes from the Mondial
.qpairSQL
That normalization is handled by scripts/bootstrap_prepare_rodi_dumps.sh.
The active workspace used by the mapping agents is:
The live generated mapping ends up at:
Each completed run is archived under:
outputs/<model>/<timestamp>/<dataset>/
Typical contents:
mappings_r2rml.ttlrun_metadata.jsonrun.loginputs/workspace/evaluation/after evaluation
Generated result pages live under:
- outputs/summary/index.html
- outputs/summary/rodi_f1_site_refactored/index.html
- outputs/summary/summary_table_site/index.html
Regenerate them with:
bash scripts/generate_summary_portal.shMain entrypoints:
- scripts/bootstrap.sh
- scripts/run_end_to_end_dataset.sh
- scripts/run_end_to_end_all.sh
- scripts/create_pg_compatible_dataset.sh
- scripts/generate_owlxml_ontologies.sh
- scripts/create_mapping_single_dataset.sh
- scripts/create_all_mapping.sh
- scripts/evaluation.sh
- scripts/generate_summary_portal.sh
- scripts/start_anthropic_mock_server.sh
Experimental results are archived on Zenodo at 10.5281/zenodo.20073873.
DOI: 10.5281/zenodo.20073239
This repository is archived on Zenodo at 10.5281/zenodo.20073239.