diff --git a/.gitignore b/.gitignore index a6e56d75..618f45d3 100644 --- a/.gitignore +++ b/.gitignore @@ -87,3 +87,6 @@ tests/e2e/_e2e_server.log .understand-anything/ + +# Local working drafts — should never be committed (per topic-folder convention) +syncs/ diff --git a/README.md b/README.md index ab6c11c7..62d2d7aa 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ OntoBricks Logo

-

OntoBricks 0.5.0

+

OntoBricks 0.7.0

Digital Twin Builder for Databricks diff --git a/app.yaml.template b/app.yaml.template index 9087d599..023c7ec7 100644 --- a/app.yaml.template +++ b/app.yaml.template @@ -19,12 +19,16 @@ # (defined in `pyproject.toml`'s optional `[project.optional-dependencies] # lakebase = [...]`) so the Lakebase Postgres backend works in the # deployed app even when the `database` resource is not yet bound. -# This is what powers the runtime backend toggle in -# **Settings → Registry Location**: the admin can switch -# Volume ↔ Lakebase without redeploying. On a Volume-only deployment -# the extra costs ~10MB of unused wheels but the app continues to -# run normally — the Lakebase code paths are guarded by -# `LakebaseAuth.is_available` and never reached. +# +# `--extra neo4j` installs the official `neo4j` Python driver +# (Bolt / Cypher) so the Neo4j graph DB engine works when an admin +# selects it in **Settings → Triple store → Global**. Same rationale +# as Lakebase: the extra costs ~5MB of unused wheels on +# Volume/Lakebase-only deployments but Neo4j code paths are guarded +# by `NEO4J_AVAILABLE` and never reached. +# +# Both extras together power the runtime backend toggle: +# the admin can switch Volume / Lakebase / Neo4j without redeploying. # # Databricks Apps sets DATABRICKS_APP_PORT automatically. # MCP endpoint is exposed at /mcp (Streamable HTTP transport). @@ -33,6 +37,8 @@ command: - "run" - "--extra" - "lakebase" + - "--extra" + - "neo4j" - "python" - "run.py" @@ -100,6 +106,17 @@ env: # MLflow — ensure traces are persisted to the workspace tracking server. - name: MLFLOW_TRACKING_URI value: "${APP_MLFLOW_TRACKING_URI}" + # ── Neo4j (optional) ────────────────────────────────────────── + # Password sourced from a Databricks Apps secret resource (see + # `resources:` below). The `neo4j-password` resource is unbound by + # default; the admin binds it to a workspace secret (scope/key) in + # the Apps UI before selecting the Neo4j engine. When the resource + # is unbound, NEO4J_PASSWORD stays unset and the deployed app refuses + # to instantiate Neo4jStore with a clear instruction in the logs. + # Local dev keeps the legacy fallback to engine_config['password']. + # See docs/v0.6-neo4j-demo/secret-configuration.md. + - name: NEO4J_PASSWORD + valueFrom: neo4j-password # User authorization scopes — declares which Databricks APIs the app is # allowed to access on behalf of the logged-in user via the @@ -126,3 +143,7 @@ resources: description: "Unity Catalog Volume for the OntoBricks domain registry" volume: permission: CAN_READ_WRITE + - name: neo4j-password + description: "Databricks Apps secret holding the Neo4j Bolt password. Unbound by default; bind it to a workspace secret (scope/key) in the Apps UI before activating the Neo4j engine. See docs/v0.6-neo4j-demo/secret-configuration.md." + secret: + permission: READ diff --git a/changelogs/v0.5.0/hourdays_2026-06-09.log b/changelogs/v0.5.0/hourdays_2026-06-09.log new file mode 100644 index 00000000..114fe2b1 --- /dev/null +++ b/changelogs/v0.5.0/hourdays_2026-06-09.log @@ -0,0 +1,59 @@ +# 2026-06-09 — Neo4j graph DB engine + +**Author:** Hugues Journeau (@hourdays) +**Branch:** `feature/neo4j-graphdb-skeleton` → PR #47 (target: `develop`) +**Version:** v0.5.0 + +## Context + +Adds **Neo4j (Bolt / Cypher)** as a selectable graph DB engine alongside Lakebase Postgres, following `docs/graphdb-integration.md`. Cleanly opt-in: existing Lakebase deployments are unaffected; users activate Neo4j by selecting it from **Settings → Triple store → Global**. + +This implementation realises the v0.6 roadmap slot ahead of the August 2026 target, building on Benoit's v0.5 `GraphDBBackend` abstraction and the `_starter_kit/` template. + +## Changes + +1. **`src/back/core/graphdb/neo4j/__init__.py`** — new package init with `NEO4J_AVAILABLE` guarded import. +2. **`src/back/core/graphdb/neo4j/Neo4jStore.py`** — full `GraphDBBackend` implementation: + - Capability flags: `supports_cypher=True`, `supports_graph_model=False` (flat triple model in v1), `query_dialect="cypher"`. + - Connection management: lazy `neo4j.GraphDatabase.driver(...)`, session-per-query via `_run`. + - Auth: `basic` (username/password) implemented; `databricks_secret` validated but resolution deferred to a follow-up PR. + - CRUD: `create_table` (SPO unique constraint), `drop_table` (constraint + nodes), `insert_triples` (`UNWIND` + `MERGE`, batched), `delete_triples`, `query_triples`, `count_triples`, `table_exists` (via `SHOW CONSTRAINTS`), `get_status`. + - `execute_query` raises `NotImplementedError` by design — no raw Cypher entry point. Preserves the C2 safeguard ("l'entrée se fait par l'ontologie", Benoit 20/05). + - All 16 named-query methods (`get_aggregate_stats`, `get_type_distribution`, `get_predicate_distribution`, `find_subjects_by_type`, `resolve_subject_by_id`, `get_entity_metadata`, `get_triples_for_subjects`, `get_predicates_for_type`, `paginated_triples`, `paginated_count`, `bfs_traversal`, `find_seed_subjects`, `find_subjects_by_patterns`, `expand_entity_neighbors`, `transitive_closure`, `symmetric_expand`, `shortest_path`, `delete_cohort_triples`) implemented in native Cypher with parameterised queries. +3. **`src/back/core/graphdb/GraphDBFactory.py`** — `_create_neo4j` dispatch + `NEO4J_AVAILABLE` guarded import + class-level availability flag. +4. **`src/back/objects/session/GlobalConfigService.py`** — `ALLOWED_GRAPH_ENGINES = ("lakebase", "neo4j")`. +5. **`src/back/core/reasoning/SWRLFlatCypherTranslator.py`** — new translator class scaffolded with the same public interface as `SWRLSQLTranslator`. **Methods return `None` and log a warning** — full SWRL → Cypher translation is its own follow-up PR. Reasoning on Neo4j therefore reports zero violations / inferences (graceful no-op) rather than crashing. +6. **`src/front/config/menu_config.json`** — new "Neo4j" item under TRIPLE STORE group. +7. **`src/front/templates/settings.html`**: + - `` in `#graphEngineSelect`. + - New `#neo4j-section` with config form: URI, database, auth method, credentials, encrypted toggle. +8. **`pyproject.toml`** — `[project.optional-dependencies] neo4j = ["neo4j>=5.0"]`. +9. **`tests/units/graphdb/test_neo4j_store.py`** — new test module (driver-mocked) covering capability flags, construction validation, schema sanitisation, CRUD Cypher emission, factory dispatch, and reasoning translator wiring. + +## Files modified / added + +- `src/back/core/graphdb/neo4j/__init__.py` (new) +- `src/back/core/graphdb/neo4j/Neo4jStore.py` (new, ~580 lines) +- `src/back/core/graphdb/GraphDBFactory.py` +- `src/back/objects/session/GlobalConfigService.py` +- `src/back/core/reasoning/SWRLFlatCypherTranslator.py` (new) +- `src/front/config/menu_config.json` +- `src/front/templates/settings.html` +- `pyproject.toml` +- `tests/units/graphdb/__init__.py` (new) +- `tests/units/graphdb/test_neo4j_store.py` (new) + +## Known limitations (deliberate) + +- **Reasoning on Neo4j is a no-op** until the dedicated SWRLFlatCypherTranslator translation PR lands. UI surfaces zero violations / zero inferences cleanly. +- **`auth_method=databricks_secret`** is validated but unresolved — basic auth is the only live-tested path. +- **`paginated_triples` SQL conditions** are not translated to Cypher — the unfiltered page is returned and the call is logged. Filtered access should switch to `find_subjects_by_type` / `find_seed_subjects`. +- **`settings.js` save handlers** for the Neo4j section are not in this commit — `engine_config` can currently be persisted via API; UI wiring follows in the next commit on this branch. +- **Build page** (`_query_sync.html` / `_domain_validation.html`) — engine-aware "Graph DB (…)" labels for Neo4j follow in the next commit on this branch. + +## Test result + +- Static syntax check on all new files: OK. +- Unit tests pass when `neo4j>=5.0` is installed; skip cleanly when not. +- Live smoke test against the Ryan-provisioned Aura (`neo4j+s://b4810af7.databases.neo4j.io`) — pending after the next commit (settings.js wiring + Build page labels). +- `make test` — to be re-run before marking the PR ready-for-review. diff --git a/changelogs/v0.5.0/hourdays_2026-06-12.log b/changelogs/v0.5.0/hourdays_2026-06-12.log new file mode 100644 index 00000000..e5033a64 --- /dev/null +++ b/changelogs/v0.5.0/hourdays_2026-06-12.log @@ -0,0 +1,69 @@ +# 2026-06-12 — Build page engine label: API fallback when dt.graph_engine is empty + +**Author:** Hugues Journeau (@hourdays) +**Branch:** `feature/neo4j-graphdb-skeleton` → PR #47 (target: `develop`) +**Version:** v0.5.0 + +## Context + +Follow-up to commit `5205010` (engine-aware Build page label). The previous +patch read `dt.graph_engine` from the `/dtwin/exist` payload, but that field +is only populated **after** a domain has been built. Pre-Build (status +"Never built"), `dt.graph_engine` is empty and the JS fallback +`dt.graph_engine || 'lakebase'` mislabels the card as "Graph DB (Lakebase)" +even when the global engine setting is Neo4j. + +This patch fetches `/settings/graph-engine` asynchronously when +`dt.graph_engine` is empty and re-applies the title + Lakebase-details +visibility once the global engine is known. + +## Changes + +1. **`src/front/static/domain/js/domain-validation.js`** — in + `populateDtwinCard()` (Build page validation card), when + `dt.graph_engine` is falsy, kick off a `fetch('/settings/graph-engine')` + and re-apply `psDtGraphBackendTitle` / `psDtLakebaseDetails` from the + resolved global engine. Initial render keeps the existing + `'lakebase'` fallback so there is no visual flicker for users on + Lakebase. +2. **`src/front/static/query/js/query-sync.js`** — same pattern in + `_applyBuildGraphEngineUi()`, gated on both `dt.graph_engine` AND + `cfg.graph_engine` being absent (avoids redundant fetches when the + value is already cached on `window.__TRIPLESTORE_CONFIG`). On success, + caches the resolved engine onto `cfg.graph_engine` for subsequent + calls. + +## Stronger JS reconciliation (2026-06-12 PM) + +Server fix landed in `dependencies.py` but `dt.graph_engine` can still arrive +stale: it reflects the engine recorded on the domain at build-time, which +isn't necessarily the active global engine. Updated both JS files to +reconcile unconditionally against `/settings/graph-engine` on every render +(was: only when `dt.graph_engine` was empty). Global engine is now the +single source of truth for the Build/Validation Graph DB card title. + +## Server-side companion fix + +3. **`src/front/fastapi/dependencies.py`** — root-cause fix for + `triplestore_page_context`. The previous body was a tautology + (`graph_engine = _raw if _raw == "lakebase" else "lakebase"`) that + silently coerced any non-Lakebase engine to `"lakebase"` before it + reached the template, so the `__TRIPLESTORE_CONFIG.graph_engine` + variable in `domain.html` / `dtwin.html` was hard-stuck on Lakebase + even when Neo4j was the active global engine. Replaced with a direct + pass-through of `TripleStoreFactory._resolve_graph_engine(...)`. The + JS fetch fallback above remains in place as defence in depth. + +## Files modified / added + +- `src/front/static/domain/js/domain-validation.js` +- `src/front/static/query/js/query-sync.js` +- `src/front/fastapi/dependencies.py` + +## Test result + +- Static syntax check on both files: OK. +- Live behaviour to be verified in the Chrome MCP screenshot capture + pass (task #54 in the PR plan) — the Build page should now display + "Graph DB (Neo4j)" pre-Build when the global engine is Neo4j on the + fevm-mjolnir deployment. diff --git a/changelogs/v0.5.0/hourdays_2026-06-22.log b/changelogs/v0.5.0/hourdays_2026-06-22.log new file mode 100644 index 00000000..35e3bf17 --- /dev/null +++ b/changelogs/v0.5.0/hourdays_2026-06-22.log @@ -0,0 +1,92 @@ +## Move Neo4j Bolt password to a Databricks Apps secret resource + +**Context:** Benoit's PR #47 review on 2026-06-18 flagged that the Neo4j password was being persisted in clear inside `global_config.graph_engine_config` — a blocker for any customer deployment. Switching to a Databricks Apps **secret resource** declared in `app.yaml` and injected as the `NEO4J_PASSWORD` env var at runtime; the persisted JSON `password` is stripped at save-time, so no clear-text credential ever lands in the database. Local development keeps the existing `engine_config.password` fallback (guarded by `DATABRICKS_APP_PORT` to detect the deployed environment). + +**Changes:** +1. `src/back/core/graphdb/neo4j/Neo4jStore.py` — `_resolve_auth()` now reads `NEO4J_PASSWORD` env var first (logged as source); in the deployed app (`DATABRICKS_APP_PORT` set) the env var becomes mandatory and a missing variable raises `InfrastructureError` with a clear remediation pointer. Local-dev path unchanged (`engine_config.password`). New module-level helper `is_neo4j_password_from_secret()` exposes the credential source to other layers. Switched `ValueError` → `ValidationError` on the user-input validation branches (aligned with `back.core.errors` hierarchy from `src/.coding_rules.md §4`). Constant `NEO4J_PASSWORD_ENV` added. +2. `app.yaml.template` — declared the `neo4j-password` resource (`secret: permission: READ`) and the `NEO4J_PASSWORD` env var with `valueFrom: neo4j-password`. The resource is **Unbound** by default; the admin binds it via the Apps UI to a workspace secret (scope/key) before activating the Neo4j engine. +3. `src/back/objects/domain/SettingsService.py` — `set_graph_engine_config_result()` strips `password` from the incoming `engine_config` dict whenever `NEO4J_PASSWORD` is set in the environment. Defence-in-depth: even if the UI sends a password, the persisted JSON never contains it. +4. `src/front/routes/home.py` — settings page context now exposes `neo4j_password_from_secret` so the Jinja template can render the credential-source badge. +5. `src/front/templates/settings.html` — Neo4j settings form: password label gains a dynamic badge (`From Apps secret` green / `Local-dev fallback` yellow) and the input is `disabled` with a `••••••••` placeholder when the secret is bound. Help text below points to the new docs page. +6. `src/front/static/config/js/settings.js` — `mergeNeo4jPanelIntoConfigTextarea()` never serialises the password field when the input is disabled (Apps secret in place), preventing stale clear-text values from being re-sent on save. +7. `docs/pr47-neo4j-demo/secret-configuration.md` (new) — step-by-step admin guide: `databricks secrets put-secret` → bind the `neo4j-password` resource → verify in Settings UI → troubleshooting table. +8. `tests/units/graphdb/test_neo4j_store.py` — new `TestPasswordSourcing` class with 7 unit tests covering helper behaviour and every branch of `_resolve_auth` (env-var-wins, local-dev fallback, prod refusal, missing creds, missing username). + +**Modified files:** +- `src/back/core/graphdb/neo4j/Neo4jStore.py` +- `app.yaml.template` +- `src/back/objects/domain/SettingsService.py` +- `src/front/routes/home.py` +- `src/front/templates/settings.html` +- `src/front/static/config/js/settings.js` +- `docs/pr47-neo4j-demo/secret-configuration.md` (new) +- `tests/units/graphdb/test_neo4j_store.py` + +**Test result:** `python3 -m py_compile` on every changed `.py` — OK. `node --check` on `settings.js` — OK. Jinja2 lex of `settings.html` — OK. Inline logic test of the 8 `_resolve_auth` + helper paths via stubs — all assertions pass. Full `pytest tests/units/graphdb/test_neo4j_store.py::TestPasswordSourcing` deferred (uv environment offline locally; will run in CI). + +## Log the executed Cypher at INFO (PR #47 review – Benoit 2026-06-18) + +**Context:** Benoit's PR #47 review asked that every Cypher statement run by the Neo4j backend be visible in the Databricks app logs at INFO level — without leaking secrets — so operators can correlate UI actions (filters, builds, GraphQL resolvers) with the backend query. Bolt-bound parameters are kept at DEBUG only (they don't carry credentials — auth lives on the driver, not the session — but they may carry URIs/literals from the build pipeline that don't belong in default INFO logs). Single-line, whitespace-flattened, truncation-marked snippets so logs stay grep-friendly. + +**Changes:** +1. `src/back/core/graphdb/neo4j/Neo4jStore.py` — `_run()` now emits one INFO log line per call: `"Cypher ( rows, ms): "`. Bound `params` go to DEBUG. Added module-level helper `_normalise_cypher_for_log()` that collapses runs of whitespace to single spaces and truncates beyond `_CYPHER_LOG_MAX` (1500 chars) with a `"… (truncated)"` marker — caps log line size without dropping context. New imports: `re`, `time`. +2. `tests/units/graphdb/test_neo4j_store.py` — new `TestCypherLogging` class with 3 unit tests: whitespace flattening, long-cypher truncation, and end-to-end verification that `_run` emits the expected INFO line with row count, duration, flattened cypher, and **no leakage of bound parameter values** into INFO logs. + +**Modified files:** +- `src/back/core/graphdb/neo4j/Neo4jStore.py` +- `tests/units/graphdb/test_neo4j_store.py` + +**Test result:** `python3 -m py_compile` — OK. Inline run of `_run` against a fake driver confirmed: 1 INFO line emitted, row count + duration + flattened multi-line cypher present, bound `$s = 'ex:value'` param **absent** from the INFO log. Truncation + whitespace-flattening helpers verified. Full pytest suite deferred to CI (uv offline locally). + +## Fix Settings page engine-selector flash (PR #47 review – Benoit 2026-06-18) + +**Context:** Benoit's review caught a UX flicker on the Settings page: the *Triple store > Global > Graph DB Engine* selector painted `Lakebase` (the first `