Skip to content

feat (graphdb) Neo4j backend — E2E green ✅#47

Open
hourdays wants to merge 31 commits into
developfrom
feature/neo4j-graphdb-skeleton
Open

feat (graphdb) Neo4j backend — E2E green ✅#47
hourdays wants to merge 31 commits into
developfrom
feature/neo4j-graphdb-skeleton

Conversation

@hourdays

@hourdays hourdays commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

TL;DR

Adds Neo4j (Bolt / Cypher) as a fully-functional graph DB engine alongside Lakebase. Opt-in via Settings → Triple Store → Global → Neo4j (Bolt). Lakebase remains the default; existing deployments are unaffected.

Version bumped to 0.7.0 for the post-review iteration that addresses every point of Benoit's 2026-06-18 PR review plus the post-smoke-test release-readiness pass on 2026-06-26.

End-to-end demo on fevm-mjolnir using a real PFAS research-paper ontology:

AI-generated classes from the paper 38 (v0.7) — 32 (v0.6)
AI-generated relations 13 (v0.6) · 0 (v0.7 — see #2 below)
Triples written to Neo4j over Bolt 303 in 5.3 s
OWL 2 RL T-Box inference 97 inferred (v0.7) · 99 (v0.6) — in 0.108 s
KG filter pfos depth 2 (live on Aura, v0.7) 21 entities, 23 relations rendered
Test-connection Bolt handshake + Cypher probe latency 1574 ms (cold)

✨ Post-review iteration — 11 commits (2026-06-22 → 2026-06-26)

Every item of Benoit's PR review punch-list (5/5) addressed, plus 2 bonuses (Test connection wire + Cypher probe), plus the post-smoke-test hardening pass (4 issues + 1 doc), plus the full E2E live re-capture on the deployed v0.7.0 app.

Punch-list + immediate bonuses

# Commit What it does
1 da9cae9 Secret resource auth. Neo4j password sourced from NEO4J_PASSWORD env var (Databricks Apps secret bound via databricks.yml). Save endpoint strips clear-text from global_config. UI badge flips between From Apps secret (green) and Local-dev fallback (yellow).
2 e8b523c Cypher logging at INFO. Every _run emits Cypher (n rows, ms): <flattened>. Bound params at DEBUG only — no credential leak.
3 577b70f Bump 0.7.0. pyproject.toml + README.md + deploy default.
4 e63bfce Settings flash fix. Engine selector server-side rendered via Jinja.
5 7a9a625 Split Neo4jStore.py (1028 LoC → 4 files). Fowler Large Class → Extract Class + façade. Public API unchanged.
6 820f607 (bonus) Test connection wired. POST /settings/graph-engine/neo4j-test runs driver.verify_connectivity().
7 b13dda0 (bonus) RETURN 1 AS probe Cypher probe + deck v0.7. Exercises the full Cypher path through Neo4jConnection.run on every Test-connection click.

Post-smoke-test hardening (release-readiness pass)

# Commit What it does
8 6782d38 Renamed docs/v0.6-neo4j-demo/docs/pr47-neo4j-demo/.
9 5b29dc3 Full E2E re-capture on ontobricks-070 (8 fresh v0.7 screenshots + new KG filter live capture).
10 65ccc73 4 issues fixed/documented: GraphQL friendly fallback (200 + ready:false instead of 400), MCP companion deploy retry, Auto-Map message clarity (chunk errors vs items-without-mapping), ai_parse_document prerequisite doc.
11 c70e55e Screenshot of the GraphQL friendly fallback rendered live.

✅ Live verification on ontobricks-070 · FEVM-Mjolnir · 2026-06-25/26

Test-connection UI alert after clicking Settings → Neo4j → Test connection:

✔️ Connected to neo4j+s://b4810af7.databases.neo4j.io (database neo4j) in 1574.4 ms · credentials from env var (NEO4J_PASSWORD — Databricks Apps secret). · RETURN 1 AS probe echoed 1 row(s) — Cypher path live.

GraphQL Playground friendly fallback (was: HTTP 400):

GraphQL not ready — Ontology has no classes — add classes in the Designer before querying the knowledge graph via GraphQL.
Ontology has 0 class(es), 0 propert(ies). Reason: no_classes

App logs captured concurrently via databricks apps logs ontobricks-070:

INFO ontobricks.core.graphdb.neo4j.Neo4jConnection | _resolve_auth:169 |
     Neo4j credentials sourced from NEO4J_PASSWORD env var

INFO ontobricks.core.graphdb.neo4j.Neo4jConnection | get_driver:141 |
     Neo4j driver opened for neo4j+s://b4810af7.databases.neo4j.io (database=neo4j)

INFO ontobricks.core.graphdb.neo4j.Neo4jConnection | run:221 |
     Cypher (1 rows, 706.4 ms): RETURN 1 AS probe

INFO ontobricks.core.graphdb.neo4j.Neo4jConnection | run:221 |
     Cypher (1 rows, 161.6 ms): MATCH (t:`WaterTreatment_V1`) RETURN count(t) AS cnt

📚 Deck + screenshots (committed in this PR)

Full deck (27 slides) and screenshots live under docs/pr47-neo4j-demo/:

Key proof screenshots

Settings → Neo4j · password badge From Apps secret (v0.7)

Settings secret-bound

Settings → Neo4j · Test connection success with Cypher probe (v0.7)

Test connection success

Build success · 3-card arch: Triple Store → Bolt (UNWIND·MERGE) → Graph DB (Neo4j) · 303 triples (v0.7 re-capture)

Build success 303 triples

Cockpit · same 3-card arch · Digital Twin Active (v0.7 re-capture)

Cockpit Neo4j active

Knowledge Graph filter pfos (depth 2) · live against Aura · 21 entities, 23 relations (v0.7 new)

KG filter live

GraphQL Playground friendly fallback (v0.7 new — was HTTP 400)

GraphQL fallback

Open questions for @benoitcayladbx

  1. execute_queryNotImplementedError. Aligned with the "l'entrée se fait par l'ontologie" rule from 20/05?
  2. Flat-triple model (single label per store) for v1; typed-node graph model deferred. OK?
  3. Modular split — Connection / WriteOps / ReadOps + façade pattern. OK for the codebase shape going forward?
  4. End-user secret flow — today we ship the "admin runs databricks secrets put-secret + bind via DAB" path. A follow-up could add a Settings-UI wizard. OK to defer?
  5. ai_parse_document prerequisite is now documented in ai-parse-document-prereq.md with two fix paths. Anything you'd add?

Test plan — all green ✅

  • python3 -m py_compile on every changed .py — OK
  • node --check on every changed .js — OK
  • bash -n scripts/deploy.sh — OK
  • make deploy to fevm-mjolnirontobricks-070 + mcp-ontobricks-070 BOTH RUNNING
  • Workspace secret ontobricks/neo4j-password bound via databricks.ymlapps get confirms binding
  • Test connection greenConnected … 1574.4 ms · Cypher path live
  • App logs contain Cypher (n rows, ms): <flattened> lines for Bolt handshake, table_exists check, count_triples, and the explicit RETURN 1 AS probe
  • GraphQL Playground returns 200 + ready:false + reason when ontology empty, instead of HTTP 400
  • MCP companion (mcp-ontobricks-070) RUNNING + ACTIVE — auto-recovered + deploy.sh now retries on first-deploy race
  • Auto-Map completion message split — distinguishes chunk-level errors from items-without-mapping
  • Settings → Triple store → Neo4j shows From Apps secret (green) badge
  • Persistence verified: GET /settings/graph-engine-config returns config with uri/db/auth/username, password key absent
  • KG filter pfos depth 2 → 21 entities + 23 relations rendered live
  • Inference UI — T-Box OWL 2 RL = 97 inferred in 0.108 s
  • Settings page renders engine selector server-side — no Lakebase → Neo4j flicker on first paint

Smoke-test artefact (committed): tests/integration/neo4j_e2e_smoke.py.

cc @benoitcayladbx — ready-for-review and ready-for-merge. 11 commits, all the v0.6→v0.7 work + post-smoke-test hardening + live verification.

This pull request and its description were written collaboratively by Hugues and Claude.

Adds Neo4j (Bolt / Cypher) as a selectable graph DB engine alongside
Lakebase Postgres. PR 1 ships the integration shape + flat-triple CRUD.
PR 2 will add the 16 Cypher named-query implementations + a
SWRLFlatCypherTranslator for reasoning.

Changes:
- src/back/core/graphdb/neo4j/ — new package, copied from the
  _starter_kit template and filled in per docs/graphdb-integration.md.
  - Neo4jStore extends GraphDBBackend; flat triples persisted as
    (:Triple:<label> {subject, predicate, object}) nodes with a SPO
    uniqueness constraint per logical store.
  - CRUD: create_table, drop_table, insert_triples (batched via UNWIND
    + MERGE), delete_triples, query_triples, count_triples,
    table_exists, get_status.
  - Capability flags: supports_cypher=True, supports_graph_model=False
    (flat triples in v1), query_dialect="cypher".
  - engine_config keys: uri, database, auth_method (basic |
    databricks_secret), credentials, encrypted.
  - Named-query overrides stubbed with safe defaults + TODO(PR2)
    markers — the app degrades gracefully on Neo4j until PR 2 lands.
  - execute_query raises NotImplementedError on purpose: no raw
    Cypher entry point; all writes go through the build pipeline
    after ontology validation (C2 safeguard).
  - sync_to_remote / sync_from_remote / local_path are no-ops —
    Neo4j Aura is remote-only.
- src/back/core/graphdb/GraphDBFactory.py — registers _create_neo4j
  dispatch, NEO4J_AVAILABLE guarded import.
- src/back/objects/session/GlobalConfigService.py — adds "neo4j" to
  ALLOWED_GRAPH_ENGINES so the Settings dropdown can persist it.

Not yet in this commit (next commits on this branch):
- Settings UI: left-menu "Neo4j" entry under TRIPLE STORE + dropdown
  option in #graphEngineSelect + Neo4j-specific config page.
- pyproject.toml optional dependency "neo4j>=5.0".
- tests/units/graphdb/test_neo4j_store.py.
- changelogs/v0.5.0/hourdays_2026-06-09.log.
@hourdays hourdays changed the title feat (graphdb) Neo4j backend skeleton (PR 1) feat (graphdb) Neo4j backend (complete feature, WIP) Jun 9, 2026
hourdays added 7 commits June 9, 2026 11:08
Adds the Neo4j surfaces in Settings so users can select and configure
the engine. JS wiring for load/save comes in the next commit.

- src/front/config/menu_config.json: new "Neo4j" item under TRIPLE
  STORE group (icon bi-bezier2), mirroring the Lakebase entry.
- src/front/templates/settings.html:
  - Dropdown: <option value="neo4j">Neo4j (Bolt)</option> in
    #graphEngineSelect (Triple store > Global page).
  - New #neo4j-section sidebar-section with the config form: URI
    (Bolt), database, auth_method (basic | databricks_secret),
    credentials, encrypted toggle. Test-connection button slot
    (handler comes in the next commit).
  - Architecture note explains the C2 safeguard (no raw Cypher).
Replaces the safe-default stubs on Neo4jStore with native Cypher
implementations of the 16 named-query methods defined on
TripleStoreBackend. The app's Knowledge Graph view, Inference page,
Graph Chat, GraphQL endpoint, and entity-detail pages now work when
Neo4j is the active engine (subject to SWRLFlatCypherTranslator,
which lands in the next commit).

Implementations cover:

- Statistics — get_aggregate_stats, get_type_distribution,
  get_predicate_distribution.
- Entity lookup — find_subjects_by_type (with optional value filter
  via toLower CONTAINS), resolve_subject_by_id, get_entity_metadata,
  get_triples_for_subjects, get_predicates_for_type.
- Pagination — paginated_triples + paginated_count. Note: SQL
  WHERE-fragment conditions are not translated; callers that need
  filtered pagination should switch to find_subjects_by_type or
  find_seed_subjects. The unfiltered case is logged.
- Traversal — bfs_traversal (iterative expansion for depth > 1),
  find_seed_subjects (entity_type × value with field=label|id|any
  and match_type=contains|exact|starts|ends),
  find_subjects_by_patterns (LIKE patterns → Cypher regex via =~),
  expand_entity_neighbors (1-hop outgoing+incoming, filtered to
  typed entities).
- Reasoning — transitive_closure (chained MATCH up to max_depth=20),
  symmetric_expand, shortest_path (BFS-based iterative reconstruction
  given the flat-triple model — a typed-relationship model would let
  us use native shortestPath).
- Cohorts — delete_cohort_triples (DETACH DELETE with safety limit).

All implementations use parameterised Cypher to avoid injection.
Graph traversal joins Triple nodes by property equality because the
flat-triple model has no typed relationships between entities — a
typed graph model is a future PR.

Remaining TODO(PR2) markers (3):
- Databricks-secret auth resolution path (file line 166)
- SWRLFlatCypherTranslator wiring in get_query_translator (line 218)
  — next commit
- The stale docstring claim about "TODO(PR2) markers throughout"
  (line 11) — will sweep in the polish pass.
Adds the Cypher counterpart of SWRLSQLTranslator so the reasoning
architecture is in place when Neo4j is the active engine. Methods are
scaffolded (return None + warn) rather than fully translating SWRL to
Cypher — that translation is its own substantial piece of work (the
SQL counterpart is ~730 lines of careful logic for builtins, negation,
variable bindings, etc.) and deserves a dedicated PR with its own test
suite. Returning None makes the reasoning engine treat each rule as
"no work to do", so the UI surfaces zero violations / zero inferences
cleanly instead of crashing.

- src/back/core/reasoning/SWRLFlatCypherTranslator.py: NEW. Same
  public interface as SWRLSQLTranslator (build_violation_sql,
  build_antecedent_count_sql, build_materialization_sql,
  build_inference_sql) plus matching *_cypher aliases. The class
  docstring documents the scaffolded status and the path to full
  implementation.
- src/back/core/graphdb/neo4j/Neo4jStore.py:
  - get_query_translator() returns SWRLFlatCypherTranslator (was a
    super() pass-through to the SQL default).
  - Module docstring refreshed: no longer mentions "TODO(PR2) markers
    throughout" since the named-query stubs have been replaced with
    native Cypher.

Known limitation (mirrored in PR description + changelog):
Reasoning on Neo4j reports 0 violations / 0 inferences until the
dedicated SWRLFlatCypherTranslator translation PR lands. All other
Neo4j surfaces (CRUD, KG view, Inference UI navigation, Graph Chat,
GraphQL) work normally.
- pyproject.toml: add optional-dependency `neo4j = ["neo4j>=5.0"]`.
  Installed via `uv sync --extra neo4j` or `pip install .[neo4j]`.
- tests/units/graphdb/test_neo4j_store.py: NEW. Driver-mocked unit tests
  covering capability flags, construction validation (missing URI, bad
  auth_method, defaults), schema sanitisation, CRUD Cypher emission
  shapes, named-query dispatch, factory routing, and reasoning
  translator wiring. Skips cleanly when neo4j is not installed.
- changelogs/v0.5.0/hourdays_2026-06-09.log: entry per .cursorrules
  format (user prefix [hourdays] + today's date).

The changelog also documents the known limitations on this branch
(reasoning no-op, settings.js wiring, Build page labels, paginated
SQL conditions, databricks_secret auth resolution).
Mirrors the Lakebase pattern: when the active engine is "neo4j",
saveGraphDbSettings dispatches to mergeNeo4jPanelIntoConfigTextarea(),
which reads the Neo4j form fields from #neo4j-section and serialises
them into the shared #graphEngineConfig textarea. The existing save
path then POSTs the JSON to /settings/graph-engine-config.

- src/front/static/config/js/settings.js:
  - saveGraphDbSettings: add neo4j branch alongside lakebase.
  - mergeNeo4jPanelIntoConfigTextarea(): NEW — reads uri, database,
    auth_method, encrypted, and either (username, password) or
    (secret_scope, secret_key) depending on auth_method; writes JSON
    to #graphEngineConfig.
  - applyNeo4jAuthMethodVisibility(): NEW — toggles .neo4j-auth-basic
    vs .neo4j-auth-databricks-secret field groups based on the auth
    method dropdown. Runs on load + on each change.
  - Live field listeners (input/change) on the 8 form fields keep the
    textarea in sync as the user edits — same UX as Lakebase.
  - Test-connection button: surface a friendly "deferred to follow-up"
    message for now so the button isn't silently broken.

End-to-end save now works: select Neo4j from the dropdown, fill the
Neo4j section form, click Save — engine_config persists via the same
endpoint Lakebase uses.
5-step procedure to validate the Neo4j engine end-to-end against a live
Aura instance — switch engine, configure connection, run build,
verify triples landed in Neo4j Browser, confirm Inference no-ops
gracefully. Captures the screenshot artefacts expected in
briefs/2026-06-09/1/ and the rollback path (just flip the dropdown
back to Lakebase).

Run this once before marking PR #47 ready-for-review.
Bug caught by the live E2E smoke test against the Ryan-provisioned
Aura instance: Neo4j 5+ CREATE CONSTRAINT only accepts single-label
patterns (FOR (n:Label)), so the original :Triple:<store_name> compound
label raised CypherSyntaxError on create_table.

Fix: switch every per-store triple node from `:Triple:<store>` to
`:`<store>`` (single backtick-quoted label per logical store). The
SPO uniqueness constraint, MERGE writes, MATCH reads, and the Show-
constraints existence check all work against this simpler schema.

Verified end-to-end against neo4j+s://b4810af7.databases.neo4j.io:
  ✓ create_table          → constraint installed
  ✓ table_exists          → True
  ✓ insert_triples(n=11)  → 11 nodes written via UNWIND/MERGE
  ✓ count_triples         → 11
  ✓ query_triples         → returns all 11 with subject/predicate/object
  ✓ find_subjects_by_type → returns both customers
  ✓ get_aggregate_stats   → total=11, distinct_subjects=5,
                            distinct_predicates=4,
                            type_assertion_count=5,
                            label_count=3
  ✓ get_entity_metadata   → {type, label} for each customer
  ✓ expand_entity_neighbors → typed neighbors of C1

Also adds the runnable smoke test as a committed artifact so future
contributors can replay the verification:

  tests/integration/neo4j_e2e_smoke.py

Reads credentials from
~/Documents/CODE/ontobricks/briefs/2026-05M-12/5/neo4j_connection_details.txt
(gitignored).

Docstring comments updated to mention the single-label scheme. No
other callers reference the old :Triple supertype.
@hourdays hourdays changed the title feat (graphdb) Neo4j backend (complete feature, WIP) feat (graphdb) Neo4j backend — E2E green Jun 9, 2026
@hourdays hourdays marked this pull request as ready for review June 9, 2026 10:18
@hourdays hourdays requested a review from a team as a code owner June 9, 2026 10:18
hourdays added 2 commits June 9, 2026 15:42
The Build / Digital Twin Information page's "Graph DB" card was
hardcoded to show "Graph DB (Lakebase)" regardless of the active engine.
Now reads from dt.graph_engine and maps to the matching label:

  lakebase → "Graph DB (Lakebase)"
  neo4j    → "Graph DB (Neo4j)"
  other    → "Graph DB (<engine>)" / "Graph DB Digital Twin" fallback

Updated:
- src/front/static/domain/js/domain-validation.js (line 456) —
  domain validation card.
- src/front/static/query/js/query-sync.js (line 156) —
  Digital Twin sync page.

The template default text "Graph DB (Lakebase)" stays for the
pre-hydration frame; JS overrides it on first render based on the
configured engine.
app.yaml.template's uv run command only included `--extra lakebase`,
so the deployed app didn't install the optional `neo4j` driver group.
At runtime that left `NEO4J_AVAILABLE = False` and any graph-facing
route (Knowledge Graph view, Inference, GraphQL, Graph Chat) raised
``InfrastructureError("Graph backend is not configured")`` even when
the admin had selected Neo4j and saved the engine config.

Add `--extra neo4j` alongside `--extra lakebase` so both engines
are available in the deployed app regardless of which one is active
at the time of deploy. Mirrors the Lakebase pattern (admin can flip
without redeploying). ~5MB extra deploy footprint when Neo4j is
unused.
hourdays added 8 commits June 12, 2026 10:44
`dt.graph_engine` is only set after a domain is built. Pre-Build it is
empty, and the existing `|| 'lakebase'` fallback mislabels the card on
Neo4j workspaces. Async-fetch `/settings/graph-engine` and re-apply the
title + Lakebase-details visibility once the global engine is known.
`graph_engine = _raw if _raw == "lakebase" else "lakebase"` is a
tautology that throws away any non-Lakebase engine before it reaches
the template, so __TRIPLESTORE_CONFIG.graph_engine was always
'lakebase' regardless of the global setting. Pass _raw through
directly; ALLOWED_GRAPH_ENGINES gate validates upstream.
When the server-rendered __TRIPLESTORE_CONFIG.graph_engine is stale
(e.g. defaulted to 'lakebase' before the global setting was switched
to Neo4j), the JS now always re-fetches the authoritative value from
/settings/graph-engine and re-applies the title pre-Build.
dt.graph_engine can be stale even after a build — it reflects the
engine recorded on the domain at build-time, not necessarily the
active global engine. Drop the "only when empty" guard and reconcile
unconditionally against /settings/graph-engine on every render.
Previous patch hid the entire dtLakebaseDetails block when engine =
neo4j, which removed both the (Lakebase-specific) Sync card AND the
Graph DB card from the Build page. New _renderEngineUi() helper
keeps the container visible and toggles only the Lakebase-specific
children: Sync card, build note, Lakebase icon. On Neo4j the Graph
DB card shows "Graph DB (Neo4j)" with the graph name + "Bolt" label.
Restore visual symmetry with Lakebase by adding a middle "Bolt"
card between Triple Store and Graph DB (Neo4j). Lakebase shows
the Lakeflow UC-synced table (persistent); Neo4j shows the Cypher
UNWIND/MERGE batch (transient, build-time). Same 3-card pipeline,
two different bridge mechanisms.
Mirror the Build page change in the Cockpit's Digital Twin section:
Triple Store → Bolt (Neo4j) / Sync (Lakebase) → Graph DB. Cockpit now
visibly shows the active engine, where before it was entirely
engine-agnostic and you had to navigate to Settings to check.
Adds docs/v0.6-neo4j-demo/ with the proof artefacts for PR #47:
- OntoBricks-PR47-Neo4j.pdf (21 slides, 4.9 MB)
- deck.html (single-file HTML deck, same content)
- screenshots/ (13 PNGs referenced by the deck)
- README.md with the demo numbers and reproduction steps

Captured live on fevm-mjolnir on 2026-06-12 with the PFAS research
paper ontology: 32 classes, 303 triples written to Neo4j over Bolt,
99 OWL 2 RL inferred, 92.3% SHACL Consistency pass in Graph mode.
@hourdays hourdays changed the title feat (graphdb) Neo4j backend — E2E green feat (graphdb) Neo4j backend — E2E green ✅ Jun 12, 2026
hourdays added 8 commits June 25, 2026 18:17
…the UI 3-card arch

The previous layout put the Lakebase / Neo4j cards on a 2-column second row
(below the OntoBricks Build → GraphDBFactory header), which compressed the
content and made it hard to read what each backend actually involves.

Switch to two horizontal rows, one per backend, each tagged with a version
badge on the left (v0.5+ for Lakebase, v0.7 for Neo4j) and laid out as the
same 3-card arch the user sees in the Build / Cockpit UI:

  v0.5+  Triple Store → Lakeflow Sync           → Graph DB · Lakebase
  v0.7   Triple Store → Bolt · UNWIND · MERGE   → Graph DB · Neo4j

This makes the symmetry between the two backends obvious — the Triple Store
card is identical (Delta VIEW in UC, R2RML-built); only the middle card
(sync mechanism) and the Graph DB target change.

- deck.html: new .backend-row / .ver-tag / .card CSS; slide-4 markup rewritten
- OntoBricks-PR47-Neo4j.pdf: regenerated from the updated deck (27 slides, 5.4 MB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants