Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.04 KB

File metadata and controls

33 lines (24 loc) · 1.04 KB

Duplicate-looking nodes (quick FAQ)

For the full pipeline (stages, normalization, merge policy, rebuild commands), see KG_PIPELINE.md — that file is the authoritative description after the redesign.

Short version

  1. The LLM runs per chunk and emits local id strings. Those are not Neo4j merge keys.
  2. Python builds canonical ids from entity type + normalized name (events also use chapter).
  3. Neo4j uses MERGE (n:Label {id: canonical_id}) so reloads stay idempotent.
  4. Bill vs Bill the Lizard stay two nodes unless you add an explicit alias/typo rule (documented in KG_PIPELINE.md — avoids silent false merges).

Wipe Aura and rebuild

uv run python scripts/clear_aura_db.py --yes
# Re-run cypher/schema.cypher in Browser if needed
uv run python main.py --steps chunk,extract,load

Inspect dedup without Neo4j

uv run python scripts/kg_dedup_stats.py --aliases --pairs

With live DB:

uv run python scripts/kg_dedup_stats.py --aura