For the full pipeline (stages, normalization, merge policy, rebuild commands), see KG_PIPELINE.md — that file is the authoritative description after the redesign.
- The LLM runs per chunk and emits local
idstrings. Those are not Neo4j merge keys. - Python builds canonical ids from entity type + normalized name (events also use chapter).
- Neo4j uses
MERGE (n:Label {id: canonical_id})so reloads stay idempotent. BillvsBill the Lizardstay two nodes unless you add an explicit alias/typo rule (documented inKG_PIPELINE.md— avoids silent false merges).
uv run python scripts/clear_aura_db.py --yes
# Re-run cypher/schema.cypher in Browser if needed
uv run python main.py --steps chunk,extract,loaduv run python scripts/kg_dedup_stats.py --aliases --pairsWith live DB:
uv run python scripts/kg_dedup_stats.py --aura