This repository contains sample code for building a Knowledge Graph + RAG (Retrieval-Augmented Generation) question-answering system on top of a Neo4j database.
The system provides two main functionalities:
- Knowledge Graph QA: Answer natural language questions using graph data with LLM
- Episode Retriever: Search for relevant video episodes based on user queries
Use this as a reference or starting point for your own KG + RAG experiments.
- 2026-01-22: The Knowledge Graph dataset has been released on Hugging Face.
The fastest way to get started is using Docker Compose.
git clone https://github.com/FujitsuResearch/FieldWork_Knowledge.git
cd FieldWork_KnowledgeClone the dataset from Hugging Face into the neo4j_import/ directory:
cd neo4j_import
git lfs install
git clone https://huggingface.co/datasets/Fujitsu/FieldWork_Knowledge_Dataset
cd FieldWork_Knowledge_Dataset
unzip "*.zip"
cd ../..Note
You need to accept the terms of use on the Hugging Face dataset page before cloning.
Also apply on the FieldWorkArena page at the same time.
Some .graphml files are compressed as .zip archives. The unzip command above extracts them.
cp .env.sample .envEdit .env with your settings:
OPENAI_API_KEY=your-openai-api-key
NEO4J_URI=bolt://localhost:7488
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-passwordpython -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtdocker compose up -dWait a few minutes for Neo4j to start up. You can check if Neo4j is ready by running:
docker compose logs neo4j | grep "Started."Or access the Neo4j Browser at http://localhost:7588 to confirm it's running.
Tip
To add new datasets to Neo4j, place them in neo4j_import/ and add a volume mount in docker-compose.yml:
volumes:
- ./neo4j_import/YourNewDataset:/var/lib/neo4j/import/YourNewDatasetThen restart Neo4j with docker compose down && docker compose up -d.
Note
The Neo4j Docker image changes ownership of mounted directories to UID 7474 (neo4j user).
If you need to modify files in neo4j_import/ after starting Neo4j, you may need to use sudo or restore ownership:
sudo chown -R $(id -u):$(id -g) neo4j_import/cd script
bash run_clear.sh # Clear existing data
bash run_import.sh # Import GraphML fileBy default, this imports kg_factory_incident_count.graphml. To import a different file, edit script/run_import.sh:
# Example (kg_factory):
FILE_PATH="FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml"Note
The FieldWork_Knowledge_Dataset contains multiple domains (factory, retail, warehouse, etc.). See the dataset repository for the full list of available .graphml files.
Tip
You can visualize the imported graph in the Neo4j Browser at http://localhost:7588. Try running the following Cypher query to see the graph structure:
MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 100Demo 1: Knowledge Graph QA - Ask questions about the knowledge graph:
bash run_kg_rag.shDemo 2: Episode Retriever - Search for relevant video episodes (uses imported data):
bash run_episode_retriever.shThis will search for episodes matching the query and output time ranges like:
Episode 1: 0.0s - 30.0s (relevance: 0.85)
Episode 2: 30.0s - 60.0s (relevance: 0.72)
...
Tip
Edit the QUERY variable in each script to try different queries:
# In run_kg_rag.sh
QUERY="What is the person in the video doing?"
# In run_episode_retriever.sh
QUERY="What safety issues occurred in the factory?"FieldWork_Knowledge/
├── docker-compose.yml # Neo4j container configuration
├── neo4j_import/ # Mounted to Neo4j's import directory
│ └── FieldWork_Knowledge_Dataset/ # Clone dataset here
│ └── ... # e.g., kg_factory/kg_factory_incident_count.graphml
├── script/
│ ├── run_clear.sh # Clear Neo4j database
│ ├── run_import.sh # Import GraphML file
│ ├── run_kg_rag.sh # Run Knowledge Graph QA demo
│ └── run_episode_retriever.sh # Run Episode Retriever demo
└── .env # Environment variables
Answers natural language questions based on the knowledge graph data using LLM.
python3 kg_rag.py --query "What is the person in the video doing?"Searches for relevant video episodes from the knowledge graph. The retrieved time ranges can be used to extract specific video segments for further analysis (e.g., as input to a Video-LLM).
python3 episode_retriever.py \
--graph_file "FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml" \
--clear_db \
--query "What safety issues occurred in the factory?" \
--top_k 10 \
--threshold 0.5 \
--episode_duration 30 \
--verbose| Option | Description | Default |
|---|---|---|
--query |
Search query (required) | - |
--graph_file |
Path to GraphML file to import before searching | None |
--clear_db |
Clear the database before importing | False |
--top_k |
Maximum number of episodes to retrieve | 10 |
--threshold |
Relevance threshold (0.0-1.0) | 0.0 |
--episode_duration |
Duration of each episode in seconds (use 30 for kg_factory) | 10.0 |
--verbose |
Display detailed output | False |
If you prefer to install Neo4j manually instead of using Docker, follow these steps:
-
Install Neo4j into your environment.
See Neo4j Installation Manual for details. -
Install Neo4j APOC plugin by following the APOC Installation Manual.
Note
This code has been tested and verified to work with the following version:
apoc-2025.02.0-core.jar
- Configure
/etc/neo4j/neo4j.confto allow APOC procedures:
dbms.security.procedures.unrestricted=apoc.*
dbms.security.procedures.allowlist=apoc.*
After editing the configuration, restart Neo4j for the changes to take effect.
Important
GraphML files must be placed in Neo4j's import directory.
Due to Neo4j's security settings, the APOC import function can only access files within the designated import directory.
Default import directory locations:
- Linux:
/var/lib/neo4j/import/ - macOS (Homebrew):
/usr/local/var/neo4j/import/
Example:
sudo cp neo4j_import/FieldWork_Knowledge_Dataset/kg_factory/kg_factory_incident_count.graphml /var/lib/neo4j/import/Tip
Alternative: Allow arbitrary file paths via APOC configuration
If you prefer to import files from any location:
-
Create or edit
/etc/neo4j/apoc.conf:apoc.import.file.enabled=true apoc.import.file.use_neo4j_config=false -
Add to
/etc/neo4j/neo4j.conf:dbms.security.allow_csv_import_from_file_urls=true -
Restart Neo4j:
sudo systemctl restart neo4j
To submit an inquiry, please follow these steps:
- Visit our page
- Click the "Inquiry" button on the bottom.
- Fill out the form completely and accurately.
It may take a few business days to reply.
See LICENSE for details.