OntoEKG is an LLM-driven pipeline for accelerating domain-specific ontology construction for Enterprise Knowledge Graphs from unstructured enterprise text.
This repository accompanies the paper “LLM-Driven Ontology Construction for Enterprise Knowledge Graphs”. The approach decomposes ontology modelling into two phases:
- Extraction module: identifies core ontology classes and properties from text.
- Entailment module: infers a logical class hierarchy (
rdfs:subClassOf) and serializes the ontology into standard RDF.
Experimental results in the paper use an evaluation dataset derived from documents across the Data, Finance, and Logistics sectors, highlighting both the potential and the challenges of LLM-based ontology construction.
The implementation is provided as a Jupyter notebook:
LLM_Driven_Ontology_Construction_for_Enterprise~Knowledge~Graphs.ipynb
This project uses Pipenv.
-
Install dependencies
pipenv install
-
Start Jupyter
pipenv run python -m ipykernel install --user --name OntoEKG pipenv run jupyter notebook
The notebook loads environment variables via dotenv.load_dotenv(...). Create a .env file in the project root.
-
Google GenAI / Gemini
GOOGLE_API_KEY=...
-
OpenRouter (for entailment via the OpenAI-compatible client)
OPENROUTER_API_KEY=...
Open the notebook and run the run_pipeline(...) cell. The pipeline:
- Extracts ontology classes/properties from the input text (Gemini)
- Infers superclass relationships across extracted classes (OpenRouter)
- Serializes the ontology to Turtle (RDF)
- Generates an interactive HTML visualization
Running the pipeline writes the following artifacts to the project directory:
ontology_<run_id>.ttl(RDF/Turtle serialization)ontology_<run_id>*.html(interactive visualization)