Skip to content

feat(semantic-layer): add reference-data commands (Chart of Accounts in the metastore)#3

Closed
ottomansky wants to merge 21 commits into
mainfrom
feat/semantic-layer-reference-data
Closed

feat(semantic-layer): add reference-data commands (Chart of Accounts in the metastore)#3
ottomansky wants to merge 21 commits into
mainfrom
feat/semantic-layer-reference-data

Conversation

@ottomansky
Copy link
Copy Markdown
Owner

Summary

Teaches kbagent the new semantic-reference-data metastore type — a per-dimension member store (one record per dimension; members in a members[] array). Driving use case: hold a Chart of Accounts (the account list + all attributes) in the semantic layer instead of a hardcoded Storage table.

Pairs with go-monorepo #533 (which added the semantic-reference-data schema to the metastore). This PR is the kbagent client side.

New self-contained sub-app — kbagent semantic-layer reference-data (alias kbagent sl reference-data):

Leaf What Class
list dimension summaries (id, dimension, member_count), optionally one model read
get one record + all members, by --id or --model + --dimension read
set create-or-replace by (model, dimension) from a JSON members file (--members-file, - = stdin) write
delete remove a record by UUID (--yes gated) destructive
kbagent sl reference-data set --project P --dimension chart_of_accounts \
  --members-file coa.json --dataset-id in.c-finance.DIM_COA

Why it's small / low-risk

The 6 existing types are hardcoded in ~8 places, but reference-data is not a model child that build generates — its members come from DIM_COA, not the AI. So it is deliberately kept out of PUSH_ORDER / build / export / diff / cascade. The new sub-app composes the generic metastore verbs (list_items / get_item / post_item / put_item / delete_item) — zero blast radius on existing model flows.

Implementation

  • metastore_client.py — register semantic-reference-data in SemanticType / SEMANTIC_TYPES; add a put_item verb so set uses the metastore's real revisioned PUT (preserves history) rather than the DELETE+POST the edit ops use.
  • SemanticLayerServicelist/get/set/delete_reference_data via the generic verbs. set is idempotent on (modelUUID, dimensionName): existing → PUT (revision++), else POST.
  • permissions.py — registry entries for all four leaves (list/get read, set write, delete destructive).
  • hints--hint client / --hint service definitions for all four leaves.
  • SKILL.md — regenerated (make skill-gen).

Tests

  • Service: CRUD, create-vs-replace, members-not-list validation, NOT_FOUND, id-vs-dimension resolution, permission-registry asserts.
  • CLI: list/get/set/delete, bad-JSON exit-2, non-TTY --yes gate.
  • Full suite green locally (3585 passed); ruff check / ruff format / ty clean; SKILL.md freshness + plugin.json version-sync pass.

Deferred (intentionally — flagged for a follow-up)

Per agreed scope this is core + mandatory companions only. Not in this PR:

  • REST serve router parity (server/routers/semantic_layer.py 1:1 CLI→HTTP).
  • Hand-written plugin-doc surfaces: commands-reference.md, gotchas.md, keboola-expert.md, context.py AGENT_CONTEXT, CLAUDE.md command list.
  • An E2E hop in tests/test_e2e.py.

Opened as draft pending those + review.

🤖 Generated with Claude Code

ottomansky and others added 21 commits May 26, 2026 22:47
…orage ergonomics (keboola#348)

* fix(sync push): writeback placeholder manifest entries in place + propagate KBC.* metadata on create

Fresh-CREATE pre-population (FIIA / scaffold emit pattern): downstream
callers seed manifest entries with placeholder ids and (optionally)
KBC.configuration.* metadata before the first `sync push`. Pre-fix, every
create unconditionally appended a new ManifestConfiguration / ManifestConfigRow
to the manifest, so N placeholders -> 2N entries after one push, every
placeholder still looked "added" on re-push (spurious duplicates on remote),
and KBC.configuration.folderName from local manifest was silently dropped.

Changes:
- Service: new `_writeback_create_config_in_manifest` finds the placeholder
  by (component_id, path) and updates id + branch_id + pull_hash /
  pull_config_hash in place; preserves all non-bookkeeping metadata
  (KBC.*). Append remains the fallback when no placeholder exists.
- Service: matching `_writeback_create_row_in_manifest` for rows under a
  parent config.
- Service: new `_propagate_kbc_metadata` POSTs any KBC.* keys from the
  manifest entry to `client.set_config_metadata` once, immediately after
  the create call. Bookkeeping keys (pull_hash, pull_config_hash) stay
  out of the metadata API.
- push_changes(): replaces the inline `manifest.configurations.append(...)`
  block (config create path) with the helper call + propagation.
- _push_create_row(): replaces `parent.rows.append(...)` with the row
  helper.

Idempotency on re-push falls out for free: after the first push, the
placeholder entry holds the real ULID, so the diff engine finds it in
remote_configs and reports no change.

Tests (TestFreshCreateWriteback, 7 cases):
- writeback config in place (placeholder + KBC.* metadata preserved)
- writeback config falls back to append when no placeholder
- propagate_kbc_metadata filters bookkeeping keys, calls set_config_metadata
- propagate_kbc_metadata no-op when there are no KBC.* keys
- writeback row in place (no manifest growth)
- writeback row falls back to append for untracked rows
- end-to-end push: placeholder + folderName -> create + set_config_metadata,
  manifest length unchanged, re-push is a no-op (status=no_changes)

Full sync test suite (77 cases in test_sync_service.py) green; full repo
suite (3576 passed, 110 skipped) green; `ty check` clean.

* feat(semantic-layer): add search-context + get-context for project-wide reads

Two project-wide read subcommands that mirror the upstream
`keboola-mcp-server` semantic-context tools (`search_semantic_context`,
`get_semantic_context`). Lets downstream callers (FIIA, scheduled agents)
drop their MCP dependency for the common "is the model populated?" and
"what's at this id?" lookups.

CLI:
- `kbagent semantic-layer search-context --project P [--pattern G ...]
  [--type model|dataset|metric|relationship|constraint|glossary|all]
  [--limit N]` — project-wide glob search over entity names; default
  searches every child type (not the model itself); `*` matches all.
  Patterns are repeatable, taking the union. Case-sensitive fnmatch.
  `--limit` short-circuits both inner and outer loops.
- `kbagent semantic-layer get-context --project P --context-id ID` —
  single fetch by id; probes semantic-model + every CHILD_TYPES entry
  until it hits, raises NOT_FOUND if no type matches. Non-404 errors
  (500, etc.) propagate immediately rather than being swallowed.

Service (`SemanticLayerService.search_context` /
`SemanticLayerService.get_context`):
- Validation at the service boundary so CLI, REST router, and
  `--hint service` callers all share the same error shape.
- `_strip_semantic_prefix` normalises the response type field from
  `"semantic-dataset"` to `"dataset"` for the CLI surface.
- Lookup order in get_context is model-first so a model hit short-
  circuits the 6-type probe to a single call.
- try/finally guarantees the metastore client is closed on success and
  on every error path.

Sync surfaces touched:
- `commands/semantic_layer.py` — two new Typer commands with
  `should_hint`/`emit_hint` short-circuits per the hint convention.
- `services/semantic_layer_service.py` — new methods + new ClassVar
  `_ALL_TYPES_FOR_LOOKUP` tuple.
- `server/routers/semantic_layer.py` — `GET /search-context` and
  `GET /get-context` (1:1 CLI->HTTP per CONTRIBUTING.md plugin-sync map);
  `Query` added to fastapi import.
- `hints/definitions/semantic_layer.py` — two new `CommandHint` entries
  with `ClientCall` + `ServiceCall` for `--hint client` /
  `--hint service` code generation.
- `permissions.py` — both registered as `read` operations.

Tests:
- `tests/test_semantic_layer_service.py::TestSearchContext` (12 cases) —
  default pattern, glob narrowing, case-sensitivity, multi-pattern union,
  type filter (singular + `all` + `model`), `--limit` short-circuit,
  invalid type / empty pattern / zero limit validation, client cleanup
  on API error.
- `tests/test_semantic_layer_service.py::TestGetContext` (6 cases) —
  finds dataset by id, finds model (short-circuit on first probe),
  NOT_FOUND after exhausting all 6 types, 500 propagates without
  swallowing, empty-id validation, client cleanup on error.
- `tests/test_semantic_layer_cli.py::TestSearchContext` (4 cases) and
  `::TestGetContext` (3 cases) — JSON envelope, kwarg propagation,
  human-mode table rendering, NOT_FOUND non-zero exit code.

Live validation against project 1143 (99_Playground_Max):
- `search-context --pattern "*"` returns 8 contexts spanning 4 types.
- `search-context --pattern "rev_*" --type metric` narrows to 1 hit.
- `get-context` with a UUID returned by the search resolves to its
  full attribute dict.
- `get-context` with `00000000-0000-0000-0000-000000000000` returns
  NOT_FOUND envelope (exit 1) after probing all 6 types.

Full test suite (3601 passed, 110 skipped) green; `ty check` clean.

* feat(sync,storage): add --branch override, --if-not-exists, --no-name-drift-warnings

Three ergonomic improvements that close downstream-tooling pain points
encountered during the FIIA -> kbagent migration.

`kbagent sync push --branch <id>` (also sync pull / sync diff):
- Per-invocation dev-branch targeting. Beats manifest.branches[0],
  active_branch_id, and branch-mapping.json (priority 0 in the resolver).
- Lets a downstream caller (or operator) target a freshly-created dev
  branch without first running `branch use` or `sync branch-link`.
- Validated mutually exclusive with --all-projects at the CLI layer.
- Symmetric on pull / diff for predictable UX.
- Threaded through `SyncService._resolve_branch_id(..., branch_override=)`.

`kbagent storage create-table --if-not-exists`:
- Opt-in flag (defaults False so existing callers are unaffected).
- When set, catches the specific `STORAGE_JOB_FAILED` + "already has the
  same display name" error, probes `get_table_detail(target_id)` to
  confirm the table truly exists at the expected id, and returns
  `{action: "skipped", skip_reason: "table already exists"}` instead of
  raising. A different table with the same display name still surfaces
  the original error (a real conflict to resolve).
- Solves the FIIA `scaffold_storage.py` 8-worker spurious-error symptom
  documented in the original proposal.

`kbagent sync push --no-name-drift-warnings`:
- Opt-out flag to suppress the cosmetic `name_drift_warnings` array in
  the result envelope. The underlying detection still runs (so a future
  reviewer can re-enable it); only the report is dropped.

Sync surfaces touched:
- `services/sync_service.py` -- `_resolve_branch_id` gains a
  `branch_override` parameter (priority 0); `push` / `pull` / `diff`
  thread it through. `push` adds `no_name_drift_warnings` flag with a
  single-line suppression at the result-envelope step.
- `services/storage_service.py` -- `create_table` gains
  `if_not_exists=False` kwarg; the IF-NOT-EXISTS branch wraps the
  existing client.create_table call with a targeted try/except that
  uses `ErrorCode.STORAGE_JOB_FAILED` (no raw string literal).
  Response envelope now carries `action: "created" | "skipped"` so
  programmatic callers can branch on outcome.
- `commands/sync.py` -- adds `--branch` to push / pull / diff;
  adds `--no-name-drift-warnings` to push; validates
  `--branch` is incompatible with `--all-projects`.
- `commands/storage.py` -- adds `--if-not-exists` to create-table.
- `server/routers/storage.py` -- `CreateTable` request model gains
  `if_not_exists: bool`; the router forwards it. Sync routes
  intentionally absent (sync is filesystem-local; documented exemption
  per CONTRIBUTING.md plugin-sync map).

Tests:
- `tests/test_storage_write.py::TestCreateTableIfNotExists` (5 cases):
  skip on existing when flag set, reraise when unset, reraise when
  target table missing despite flag, reraise on non-duplicate errors
  even with flag, success path unchanged with flag.
- `tests/test_sync_service.py::TestBranchOverrideAndNameDriftFlag`
  (4 cases): resolver priority (override wins), push branch_override
  reaches client, diff branch_override reaches client,
  no_name_drift_warnings suppresses the field from the envelope
  (with a control-arm check that proves the warning surfaces by
  default).
- One existing test in `test_storage_write.py` updated to include the
  new `if_not_exists=False` kwarg in its `assert_called_once_with`.

Live validation against project 1143 (99_Playground_Max), branch 388072:
- `sync diff --branch 388072` reaches the dev branch and reports
  `remote_only: 31`; without `--branch`, same call reports no remote
  diff.
- `storage create-table --if-not-exists` end-to-end: first call returns
  `action: "created"`; second call (same name) returns
  `action: "skipped", skip_reason: "table already exists"`; third call
  WITHOUT the flag returns the original `STORAGE_JOB_FAILED` error
  envelope.

Full test suite (3610 passed, 110 skipped) green; `ty check` clean;
`ruff check` + `ruff format --check` clean.

* release: 0.47.0 — fresh-CREATE writeback, semantic-layer reads, sync/storage ergonomics

Bumps version to 0.47.0 and walks the full silent-drift sync map
mandated by CONTRIBUTING.md §17 + §322-425 for the three feature
commits already on this branch:

  aaf83bc  fix(sync push): writeback placeholder manifest entries in
           place + propagate KBC.* metadata on create
  c6ce7ad  feat(semantic-layer): add search-context + get-context for
           project-wide reads
  40df5fa  feat(sync,storage): add --branch override, --if-not-exists,
           --no-name-drift-warnings

Version + auto-regenerated artefacts (CI-checked):
- pyproject.toml: 0.46.1 -> 0.47.0
- .claude-plugin/marketplace.json + plugins/kbagent/.claude-plugin/plugin.json
  re-synced via `make version-sync`
- plugins/kbagent/skills/kbagent/SKILL.md decision table regenerated
  via `make skill-gen` (now lists search-context and get-context)
- src/keboola_agent_cli/changelog.py: new 0.47.0 entry covering all
  three fixes + the no-sync-router exemption note
- uv.lock: keboola-agent-cli pin advanced to 0.47.0

Hand-maintained surfaces (silent-drift risks; not CI-checked):
- src/keboola_agent_cli/commands/context.py AGENT_CONTEXT --
  sync push/pull/diff signatures updated for --branch / --no-name-
  drift-warnings; storage create-table gains --if-not-exists;
  semantic-layer search-context / get-context added.
- CLAUDE.md `## All CLI Commands` -- storage create-table gains
  --if-not-exists; semantic-layer search-context + get-context added.
  (Sync commands are not currently listed in this file -- pre-existing
  gap, out of scope for this PR.)
- plugins/kbagent/agents/keboola-expert.md Tool Selection Matrix --
  the existing semantic-layer "list models / entities" row points at
  search-context / get-context for project-wide glob/id lookup. The
  60 KB budget for the keboola-expert prompt is tight (closing at
  59944 bytes); the addition was kept terse rather than expanding the
  matrix with a fresh row.
- plugins/kbagent/skills/kbagent/references/commands-reference.md --
  storage create-table gains --if-not-exists note; sync push/pull/diff
  gain --branch and --no-name-drift-warnings notes; new bullets for
  semantic-layer search-context and get-context.
- plugins/kbagent/skills/kbagent/references/gotchas.md -- four new
  `(since v0.47.0)` sections: fresh-CREATE writeback contract change,
  --branch override semantics, storage --if-not-exists envelope,
  --no-name-drift-warnings opt-out, and the search-context / get-
  context MCP-parity note.
- plugins/kbagent/skills/kbagent/references/sync-workflow.md -- new
  "Per-invocation dev-branch override" and "Fresh-CREATE writeback"
  sections with worked examples.

`make check` passes clean (lint + format + skill + version + changelog
+ error-codes + 3610 tests).

Pre-existing PR-body-relevant exemptions documented in changelog:
- `kbagent sync push/pull/diff` (and their new --branch flag) remain
  filesystem-local and intentionally have no REST router in
  src/keboola_agent_cli/server/routers/. Permitted by the CONTRIBUTING
  Plugin Synchronization map ("terminal-only / filesystem-bound
  commands"). All other new surfaces (storage create-table, semantic-
  layer search-context / get-context) are exposed 1:1 over HTTP.

* review: iteration-2 fixes (multi-branch writeback safety, metadata error accumulation, skip render, E2E coverage)

Independent reviewer findings from iteration 2 (no BLOCKING; 4
NON-BLOCKING + 3 NIT in code; 2 NIT in security). All material items
addressed:

1. `_writeback_create_config_in_manifest` now matches placeholders on
   `(branch_id, component_id, path)` instead of `(component_id, path)`
   alone. Without this, a multi-branch manifest with the same logical
   config path under two branches could update the wrong branch's
   entry. The placeholder branch_id is also no longer overwritten by
   the helper -- the match already proves it's correct. New regression
   test: `test_writeback_config_does_not_match_across_branches`.

2. `_propagate_kbc_metadata` now returns the API error message on a
   non-fatal write failure (the config IS already created and the
   manifest writeback is complete; aborting the rest of the push
   mid-loop was the worse failure mode). The push loop accumulates the
   message into the existing `errors[]` list under a new
   `change_type: "metadata_propagation"` entry so callers can see what
   went wrong without losing the rest of the push. Added a docstring
   "not a secret store" note about KBC.* keys. New unit test:
   `test_propagate_kbc_metadata_returns_error_message_on_api_failure`.

3. `kbagent storage create-table --if-not-exists` human-mode renderer
   now prints "Skipped (already exists): <table_id>" + the reason when
   `result["action"] == "skipped"`, instead of the misleading
   "Created table: ..." line. JSON envelope unchanged. New CLI test:
   `test_human_renders_skip_when_action_is_skipped`.

4. E2E coverage in `tests/test_e2e.py::TestE2E_0_47_0_NewSurfaces`:
   - `storage create-table --if-not-exists` round-trip: created ->
     skipped -> raises without flag (binds the action envelope shape
     and the STORAGE_JOB_FAILED reraise path).
   - `semantic-layer search-context` envelope shape + type filter
     narrowing; `get-context` NOT_FOUND on all-zero UUID; roundtrip
     search -> get when the project has at least one searchable
     entity.
   - `sync diff --branch <id>`: creates a throwaway dev branch on the
     fly, asserts diff reaches that branch (status=ok, changes != None,
     remote_only >= 0), cleans up the dev branch in teardown.

Items kept but not changed:
- Pyright lambda-parameter noise in tests (`url`/`token` "not accessed"):
  pre-existing across the test suite, idiomatic for the client_factory
  signature; ty is clean.
- FastAPI router `type` parameter name in `semantic_layer.search-context`:
  cosmetic shadow of the builtin; ignoring.

`make check` clean: 3613 passed, 7 skipped, 106 deselected.
`make test-e2e-local CONFIG_DIR=/tmp/kbc-config-e2e ALIAS=e2e-1143
PYTEST_ARGS=...TestE2E_0_47_0_NewSurfaces` -- all 3 new e2e tests pass
against project 1143 (529s; 67 passed, 6 skipped, 0 failed total when
running the broader e2e suite).

* review: iteration-3 convergence cleanup (changelog accuracy, e2e docstring off-by-one)

Iteration 3 (independent convergence reviewer) returned "CONVERGED -- zero
material findings." Two documentation NITs noted and fixed here:

- changelog.py:12 -- the 0.47.0 entry's prose still said
  `_writeback_create_config_in_manifest` matches placeholders by
  `(component_id, path)`. Iteration 2 narrowed the key to
  `(branch_id, component_id, path)` for multi-branch safety; the
  changelog now reflects the final key and notes the why.
- tests/test_e2e.py docstring on TestE2E_0_47_0_NewSurfaces -- said
  "All four touch a real Keboola project" but the class has three test
  methods. Fixed to "All three".

Iteration 3 also flagged one pre-existing scope item that iteration 2
did NOT introduce:

- `storage create-table --if-not-exists` skipped-envelope reports the
  USER-REQUESTED `primary_key` / `columns` (from the create call's
  args), not the EXISTING table's actual schema. A caller that relies
  on the envelope to discover the real schema would get the wrong
  shape. Out of scope for this PR; will be filed as a separate
  follow-up issue against keboola/cli before merge per the deferred-
  scope-orphan rule.

`make check` clean: 3613 passed, 7 skipped, 106 deselected.

Branch is convergence-clean. Next: pause for user authorization before
opening the PR.

* test(e2e): add Area B fresh-CREATE writeback + KBC.* propagation against real API

Closes the most important e2e coverage gap that the user flagged:
the Area B headline fix (writeback in place + KBC.configuration.*
metadata propagation on CREATE) was only live-validated manually in
the earlier session; nothing in the test suite would catch a
regression.

New test `TestE2E_0_47_0_NewSurfaces::test_sync_push_fresh_create_writeback_and_kbc_metadata`:
- Creates a throwaway dev branch on the configured project.
- `sync init` then `sync pull --branch <dev>` so the dev branch lands
  in the manifest.
- Hand-authors a placeholder ManifestConfiguration with
  `KBC.configuration.folderName` declared (FIIA / scaffold pattern),
  writes a matching `_config.yml` locally.
- `sync push --branch <dev>` -- asserts `created=1, errors=[]`.
- Manifest invariants: length unchanged (writeback in place, NO
  duplicate), placeholder id replaced with the API-assigned ULID,
  `KBC.configuration.folderName` preserved on the entry under the
  right `branch_id`.
- `config metadata-list` against the new id verifies the folderName
  landed on the remote via the metadata API.
- `sync push` second invocation -- asserts `created=0` (idempotent).
- Teardown deletes the dev branch (and with it every config created
  inside it) so re-runs don't accumulate residue.

Also: `yaml` import added at the top of `tests/test_e2e.py` -- the
file uses `yaml.dump` in the placeholder fixture builder.

Live validated:
- New test passes against project 1143 (e2e-1143 / 99_Playground_Max)
  in 9.44s (direct pytest invocation).
- Full e2e suite still 67 passed, 6 skipped; the one failing test
  (`TestFullE2E::test_full_cli_e2e::_test_file_operations`) is an
  unrelated pre-existing flake against the Storage Files index lag
  and is not introduced by this change.

* review: /kbagent:review iteration-4 fixes (VERSION GATE drift, follow-up issue, gotchas caveat, fnmatch import)

The /kbagent:review subagent caught one BLOCKING and three lower-
severity findings that the iteration-2 + iteration-3 independent
reviewers had missed. All four addressed here:

[B-1] keboola-expert.md §1 Rule 6 VERSION GATE not updated for 0.47.0.
  An agent on 0.46.x would attempt `semantic-layer search-context` /
  `get-context` / `storage create-table --if-not-exists` /
  `sync push|pull|diff --branch` and get "No such command", silently
  losing the stated MCP-parity benefit. Added a single 0.47.0+ row
  covering all four new surfaces + the fresh-CREATE writeback +
  KBC.* propagation behavior change. Stayed under the 60000-byte
  prompt budget by also tightening the verbose 0.41.0 `semantic-layer`
  build-heuristic note from a five-line wall to a one-line inline.

[NB-1] PR body had `TBD` for the deferred-scope follow-up issue.
  Filed keboola#349 with a complete repro + suggested fix shape
  and updated the PR body to link it. Tracking the design-surface
  scope outside this PR per the deferred-scope-orphan-prevention rule.

[NB-2] gotchas.md `--if-not-exists` entry documented the happy path
  but did not warn that the skipped envelope's `columns` /
  `primary_key` mirror the user's REQUEST, not the EXISTING table's
  actual schema. Added an explicit caveat referencing keboola#349
  and pointing callers at `storage table-detail` if they need the
  real shape.

[NIT] `import fnmatch` inline inside `_matches_any_pattern` static
  method body. Hoisted to the top-level imports in
  `semantic_layer_service.py` for consistency with `permissions.py`
  and the rest of the file.

`make check` clean (3613 passed, 7 skipped). `tests/test_agent_prompt.py`
budget check green (60000-byte ceiling respected). `ty check` clean.

* review: /kbagent:review iteration-5 fixes (file-size ceiling, hint registry)

The second /kbagent:review pass returned APPROVE with two new findings
the prior reviewers hadn't surfaced. Both addressed here:

[NB-1] services/semantic_layer_service.py crossed the CONTRIBUTING.md
  hard ceiling (1500 LOC for services/*.py): 1480 -> 1640 LOC during
  this PR. Extracted search_context + get_context into a new sibling
  helper `services/_semantic_layer_lookup.py` following the existing
  `_semantic_layer_crud.py` / `_semantic_layer_internals.py` / etc.
  pattern. The helpers now own the metastore client lifecycle (open +
  finally close) via an `open_client: Callable[[], MetastoreClient]`
  factory the service injects with a 1-line lambda; the service
  methods are pure 1-line delegators.

  semantic_layer_service.py: 1640 -> 1496 LOC (under the hard ceiling).
  _semantic_layer_lookup.py: new file, 187 LOC.

  Two minor banner-comment trims (`# Helpers (used by every subcommand)`,
  `# Phase 3 — Read commands`) collapsed to single inline comments to
  bring the count just under the 1500 ceiling without changing any
  behavior or structure.

[NIT-1] hints/definitions/storage.py `create-table` `ServiceCall` args
  was missing `if_not_exists`. An AI agent following `--hint service`
  would generate non-idempotent code even when the caller wanted the
  IF-NOT-EXISTS path. Added the arg + a `notes[]` line documenting
  the 0.47.0+ flag.

Verification:
- `make check` clean: 3613 passed, 7 skipped, 107 deselected
- `ty check` clean
- The two existing e2e tests touched by this change
  (test_semantic_layer_search_and_get_context,
  test_sync_push_fresh_create_writeback_and_kbc_metadata) re-run
  against project 1143 and still pass in 13.67s
- The Pyright "Import could not be resolved" diagnostics on the new
  `_semantic_layer_lookup` import are stale-cache artifacts; ty is
  the project's authoritative typechecker and it is green.

Helper-design choice: the extraction inverts the client-lifecycle
ownership (helpers open + close vs. service open + pass-in close).
This matters because the service methods become genuinely 1-line and
the orchestrator class stays well under budget for future growth.
The cost is one extra import (`Callable` via TYPE_CHECKING) in the
helper; gain is ~50 LOC saved in the orchestrator on top of the
~140 LOC moved out.

* refactor(semantic-layer): switch try/finally + close() to `with open_client() as client:`

Addresses the lone NIT from the /kbagent:review iteration-5 pass.
CONTRIBUTING.md prefers the `with` form for resources that implement
`__enter__`/`__exit__`; `MetastoreClient` has had both since v0.41.0
but every method in `services/semantic_layer_service.py` was using
the older `try/finally + client.close()` idiom. The new
`_semantic_layer_lookup.py` (added in iteration 5) inherited that
pattern. The reviewer flagged the new helper but the right fix is to
sweep the whole service for consistency, not patch just the new file.

20 single-client sites converted via a small one-shot Python rewrite
(`/tmp/refactor_with.py`) that matched the canonical shape:

    client = self._new_metastore_client(project)
    try:
        ...
    finally:
        client.close()

    -->

    with self._new_metastore_client(project) as client:
        ...

Body indentation is unchanged (the try-block and the with-block use
the same +4 indent).

1 cross-project (promote) site converted by hand to the new
parenthesized multi-context-manager form:

    with (
        self._new_metastore_client(projects[from_project]) as src_client,
        self._new_metastore_client(projects[to_project]) as tgt_client,
    ):
        ...

`_semantic_layer_lookup.py::run_search_context` and `run_get_context`
also switched to `with open_client() as client:`. The factory pattern
the service injects still works: `lambda: self._new_metastore_client(
self._resolve_one_project(alias))` returns a MetastoreClient
configured per the resolved project.

Test fixture updates: `_make_service` in `test_semantic_layer_service.py`
now sets `mock.__enter__ = MagicMock(return_value=mock)` +
`mock.__exit__ = MagicMock(return_value=False)` on the injected
MagicMock so a `with` block over the mock yields the same body the
test configures side-effects on. The `TestPromoteModel` cross-project
fixture got the same treatment for both source and target mocks.
`mock.close.assert_called_once()` -> `mock.__exit__.assert_called_once()`
across 5 sites (the cleanup is now invoked via __exit__, not close).

Verification:
- `make check` clean: 3613 passed, 7 skipped, 107 deselected.
- All 4 e2e tests against project 1143 still pass in 32.91s:
  storage create-table --if-not-exists round-trip, semantic-layer
  search-context + get-context, sync diff --branch, fresh-CREATE
  writeback + KBC.* propagation.
- `ty check` clean.

Net diff: -55 LOC in semantic_layer_service.py (now well under the
1500-LOC ceiling at ~1441) thanks to losing the explicit
finally/close at every site.

* fix(ci): add @skip_without_credentials to TestE2E_0_47_0_NewSurfaces

GitHub CI's `make check` runs `pytest -m "not e2e"` and the
`TestE2E_0_47_0_NewSurfaces` class was correctly tagged with
`@pytest.mark.e2e`, but I missed the second decorator the other E2E
classes in this file use: `@skip_without_credentials`. Without it,
when CI somehow does collect the class (e.g. via `pytest tests/`
without the `-m "not e2e"` filter, or via another wrapper), the
fixture tries to read `os.environ[ENV_TOKEN]` and raises `KeyError`
during setup rather than skipping cleanly.

Reproducer:
  unset E2E_API_TOKEN; uv run pytest tests/test_e2e.py::TestE2E_0_47_0_NewSurfaces
  -> 4 ERROR ... KeyError: 'E2E_API_TOKEN'

After the fix:
  unset E2E_API_TOKEN; uv run pytest tests/test_e2e.py::TestE2E_0_47_0_NewSurfaces
  -> 4 SKIPPED in 0.20s

This matches the pattern every other E2E class in the file uses
(see TestFullE2E, TestE2EErrorHandling, TestE2EJsonConsistency,
TestE2ESyncWorkflow -- all stack `@skip_without_credentials` ABOVE
`@pytest.mark.e2e`).

Verified GitHub Actions log for run 26441694904 showed exactly this
failure mode: "ERROR ... KeyError: 'E2E_API_TOKEN'" on all four new
tests, with 3610 non-e2e tests passing alongside.
…ema on skip (keboola#349) (keboola#350)

The action:"skipped" envelope now returns the EXISTING table's columns /
primary_key / name (sourced from the get_table_detail probe that confirms
the table exists) instead of re-echoing the caller's request. The requested
values are preserved under requested_columns / requested_primary_key, and a
new schema_drift flag marks when the existing table diverges from the request.
Human-mode output shows the actual schema on a skip and warns on drift.

Bumps 0.47.0 -> 0.47.1.
* fix workspace login type for snowflake

* format workspace login type tests

* fix snowflake workspace keypair creation

* Address Snowflake workspace review feedback
…r-group (keboola#355)

* docs(contributing): deprecate --hint requirement, make tool-matrix per-group

Two policy fixes surfaced by the dev-portal review (PR keboola#354), where two
of the repo's own rules collided with each other:

- --hint code generation is already deprecated in favour of the
  `kbagent serve` REST API (CLAUDE.md, gotchas.md), but the per-command
  checklist and the kbagent-pr-reviewer prompt still demanded a
  hints/definitions entry per command and flagged its absence BLOCKING.
  Drop the requirement; existing hint definitions stay for back-compat
  but are no longer extended, and reviewers must not flag a missing one.

- keboola-expert.md §2 Tool Selection Matrix is a static subagent system
  prompt loaded eagerly into every run, with a hard 60 KB budget. The
  rule demanding one matrix row PER COMMAND fought that budget directly.
  Make it one row per command GROUP; exhaustive per-command detail lives
  in AGENT_CONTEXT (loaded dynamically). A missing matrix row is no
  longer BLOCKING. Trim stale content rather than raising the cap.

* docs(contributing): clarify matrix is author-expected but NON-BLOCKING in review
…E; --branch promotes default tree (keboola#360)

Fresh-CREATE variable binding (KFR-03/04/05): a transformation scaffolded with its keboola.variables config + values row is runnable after one push. --branch promotes the default tree when no per-branch subtree exists (KFR-07). Bumps 0.47.2 with full plugin/doc-sync.
… flags (0.48.0) (keboola#356)

* feat(feature): add `kbagent feature` command group for Manage API feature flags (0.48.0)

Adds a new `feature` command group for listing and managing Keboola
feature flags via the Manage API, following the same 3-layer design and
super-admin-token policy as `org` / `member`.

Commands (7):
  feature list           --project ALIAS                 # stack catalogue (GET /manage/features)
  feature project-show   --project ALIAS                 # features assigned to a project
  feature project-add    --project ALIAS --feature NAME [--dry-run] [--yes]
  feature project-remove --project ALIAS --feature NAME [--dry-run] [--yes]
  feature user-show      --project ALIAS --email EMAIL
  feature user-add       --project ALIAS --email EMAIL --feature NAME [--dry-run] [--yes]
  feature user-remove    --project ALIAS --email EMAIL --feature NAME [--dry-run] [--yes]

Token security: the super-admin Manage API token is resolved via the
existing default-deny `resolve_manage_token()` (interactive hidden prompt;
never persisted; never a CLI argument; `--allow-env-manage-token` opt-in
for CI). `--project ALIAS` resolves the stack URL (and, for project ops,
the numeric project_id) from config.

Layers:
  - manage_client.py: list_features / add|remove_project_feature /
    get_user / add|remove_user_feature (email + feature URL-encoded in path;
    POST body {"feature": NAME} sent as application/json)
  - models.py: Feature model (only `name` stable; extras pass through;
    bare-string features normalised to {"name": ...})
  - services/feature_service.py: FeatureService (alias resolve, dry-run,
    feature normalisation)
  - commands/feature.py: thin Typer layer, dual output, dry-run/confirm
  - permissions.py: list/*-show=read, *-add=admin, *-remove=destructive

Notes:
  - There is no dedicated per-project feature-list endpoint; project/user
    features are read from the project/user object's `features` array.
  - Bumped PROMPT_BYTE_BUDGET 60k -> 62k to fit the keboola-expert matrix
    row (split into specialists if it keeps growing).

Tests: test_feature_service.py (19), test_feature_cli.py (21),
test_manage_client.py + test_models.py extensions, read-only E2E
test_feature_flags_read_e2e (opt-in `make test-e2e-feature`).

Docs/sync surfaces: CLAUDE.md, AGENT_CONTEXT, keboola-expert.md,
SKILL.md (+ regenerated table), commands-reference.md, gotchas.md,
changelog.py; version 0.47.1 -> 0.48.0 (version-sync).

* fix(feature): adaptive Rich table -- omit empty Title/Type/Description columns

Project/user feature arrays come back from the Manage API as bare strings
(name-only), so the optional columns were rendered uniformly empty. The
table now shows a column only when at least one feature populates it: the
stack catalogue (GET /manage/features) keeps Title/Type/Description, while
project-show / user-show collapse to just Name. JSON output is unchanged.

* fix(feature): address PR review -- serve REST parity, dataclass, arg order, 204 guard

Resolves the kbagent-pr-reviewer findings on keboola#356:

B-1 (blocking): add `server/routers/feature.py` -- a 1:1 `kbagent serve`
REST router for all 7 feature commands, wired into app.py (import, OpenAPI
tag, include_router) and ServiceRegistry.feature. Every endpoint requires
the X-Manage-Token header, mirroring the `members` / `org` routers. Closes
the serve parity gap that would have 404'd all /feature/* paths.

NB-1: replace the bare `tuple[str, int]` returned by
`FeatureService._resolve_alias` with a frozen `_ResolvedAlias` dataclass
(stack_url, project_id), per CONTRIBUTING Code Quality Patterns.

NIT-1: flip `formatter.error(...)` to error_code-first argument order in
commands/feature.py.

Open question: the POST add paths (`add_project_feature` /
`add_user_feature`) now tolerate a 204 No Content body
(`response.json() if response.content else {}`) instead of assuming a
JSON body on every stack.

Tests: 7 new server router tests in test_server_router_calls.py (token
pass-through for all endpoints + 401 on missing X-Manage-Token). Full
suite 3690 passed; changelog 0.48.0 entry updated.

* fix(feature): close re-review nits -- serve auth prefix + APP_DESCRIPTION

Follow-up to the re-review on keboola#356 (verdict APPROVE):

- Add "/feature" to `_allow_static_through_auth`'s `api_prefixes` in
  app.py so feature GET endpoints are auth-gated in --ui mode like every
  other API group (the manage-token check already fired second, so no
  data leaked, but the prefix omission was a logic gap).
- Mention "feature flags" in the APP_DESCRIPTION Project Management
  layout bullet for completeness (feature already has its own OpenAPI tag).
…te safety (keboola#354)

* docs: brainstorm spec for kbagent dev-portal command group

Adds the design document produced during /superpowers:brainstorming for
wrapping the Keboola Developer Portal API (apps-api.keboola.com) in
kbagent. Spec covers data model (multi-identity, mirrors KB project
storage), client/service/command 3-layer split, the random-code TTY
confirm safety bar with no env-var bypass, v1 op scope
(list/get/create/patch/upload-icon/publish/deprecate plus peers
lookup), permission-registry integration, testing layout, and the
rule keboola#17 documentation-sync checklist. No implementation yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(dev-portal): drop peers from spec + add implementation plan

Removes the `dev-portal peers` helper from v1 scope -- the agent can
compose peer-config research from `list` + `get` directly. Adds the
16-task implementation plan produced by /superpowers:writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dev-portal): add ErrorCode entries for Developer Portal

* refactor(safety): extract require_random_code_confirmation() to _helpers

Move the load-bearing safety primitive from commands/permissions.py into
commands/_helpers.py so upcoming Developer Portal write commands can reuse
it without duplicating the guard logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): add DeveloperPortalIdentity model + AppConfig fields

* feat(dev-portal): ConfigStore methods for identity CRUD

* feat(dev-portal): extend CLAUDE_CONFIG_WARNING to mention DP credentials

* feat(dev-portal): client skeleton + token-path login

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): MFA login path via /dev/tty

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): client reads + create/patch/publish/deprecate

* feat(dev-portal): icon upload (two-hop, presigned S3 PUT)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): service with identity CRUD + verify-on-add

DeveloperPortalService skeleton: add/list/remove/edit/rename/use/verify
identity methods. add_identity runs the login probe BEFORE persisting
so bad credentials never land in config.json. Tests cover happy path,
verify-failure-no-persist, use_identity default, and remove.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): service reads + prepare/apply + diff + publish pre-flight

* feat(dev-portal): permission registry entries + identity resolver

Add 13 dev-portal.* entries to OPERATION_REGISTRY, resolve_identity_alias()
helper, and get_dev_portal_service() factory to _helpers.py; cover with
TestDevPortalPermissions test class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dev-portal): identity subcommands + list/get

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(dev-portal): reconcile permission registry keys + wire callbacks

Task 12 added entries with hyphenated identity names (dev-portal.identity-add)
that are unreachable -- check_cli_permission builds keys from the Typer tree
as `{group}.{subcommand}`, so identity-sub-app leaves are `dev-portal.identity.add`
(dotted). Task 13 added the correct dotted form and the original hyphenated
entries were dead. This:

- Drops the 6 unreachable hyphenated identity entries (Task 12 leftovers).
- Adds `dev-portal.identity: read` so the parent-callback descent is allowed.
- Realigns categories to data-app.secrets-* precedent: credential add/edit
  are `write` (admin is reserved for org-level ops).
- Wires callbacks on dev_portal_app and identity_app so the engine actually
  fires on these commands.
- Updates TestDevPortalPermissions to assert on the actual runtime keys.

* feat(dev-portal): write commands gated by random-code confirm

Add create / patch / upload-icon / publish / deprecate commands under
`kbagent dev-portal`. Each write command calls `_assert_tty()` as its
very first action (before any file I/O or API calls), refusing with exit 6
on non-TTY shells. `--dry-run` bypasses both the TTY check and the
random-code prompt, prints a preview, and exits 0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(dev-portal): version 0.48.0, E2E test, AGENT_CONTEXT, plugin docs

Bump version to 0.48.0 for the `kbagent dev-portal` command group and
update all silent-drift surfaces per CONTRIBUTING.md rule keboola#17.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(dev-portal): trim keboola-expert.md back to 60K prompt budget

Task 15's additions to keboola-expert.md (version-gate, tool-matrix row,
inline gotcha) pushed it 1,652 bytes over the CI-enforced 60,000-byte
budget. Drops all three additions; dev-portal coverage in the agent
surface lives in AGENT_CONTEXT, SKILL.md decision table,
commands-reference.md, dev-portal-workflow.md, and gotchas.md (which
have no equivalent size cap). File now back to 59,992 bytes.

Matches the pattern of fe246e5 (which previously reverted a similar
edit for the same reason).

* docs(dev-portal): add command help text + regenerate SKILL.md

The auto-generated decision table in SKILL.md is sourced from each
Typer command's help text via scripts/generate_skill.py. Task 13/14
didn't add help= to the @dev_portal_app.command decorators, so the
table generator skipped every dev-portal command and the prior manual
entries got stripped on make skill-check.

Adds concise help text to every dev-portal command (identity sub-app
+ list/get + create/patch/upload-icon/publish/deprecate), then
regenerates SKILL.md. The auto-generated rows now correctly enumerate
the full dev-portal surface.

* fix(dev-portal): authenticate once across prepare/apply (no double MFA)

patch and publish opened a fresh portal client in prepare_* (to read
current state) and again in apply() (to write), each triggering its own
/auth/login. On a personal MFA account that meant TWO MFA prompts for a
single write; on a service account it was a redundant second login.

The service now caches the bearer per alias (in-memory, rebuilt per CLI
invocation) and seeds it into the apply() client via a new
client.seed_bearer()/bearer API, so the human authenticates at most once
per command. create/deprecate/upload-icon were already single-login
(their prepare_* defers auth) and are unaffected.

Also type BaseHttpClient.__enter__ as Self so `with <subclass>() as c`
keeps the concrete type — this clears the pre-existing ty errors where
DeveloperPortalClient methods were unresolved on BaseHttpClient inside
`with` blocks (dev_portal_service + test_dev_portal_client).

* feat(dev-portal): expose read endpoints via kbagent serve

Add server/routers/dev_portal.py with GET /dev-portal/apps (list by
vendor) and GET /dev-portal/apps/{app} (get one), wired into the FastAPI
app + ServiceRegistry. Mirrors `kbagent dev-portal list|get` so external
consumers (Web UI, scheduled agents) can do peer-config research over the
REST surface.

Writes (create/patch/upload-icon/publish/deprecate) and identity
management stay CLI-only by design: writes require a human to type a
random code on a TTY (meaningless over HTTP), and identity commands
handle login credentials that must not travel over this API. The skip is
documented in the router module docstring and the OpenAPI tag.

* docs(dev-portal): add Tool Selection Matrix row for keboola-expert

Give the dev-portal group one per-group row in §2 (reads agent-safe;
writes TTY-confirmed, never raw apps-api). Stays under the 60 KB prompt
budget by trimming now-deprecated `--hint client` fallbacks from five
existing matrix rows (per the policy merged in keboola#355: --hint is
superseded by `kbagent serve`).

* fix(dev-portal): require bearer auth on GET /dev-portal/* in serve --ui

`_is_ui_public` treats any GET not matching `api_prefixes` as an SPA route
and serves it without bearer validation. The new `/dev-portal` router was
not added to that allow-list, so `GET /dev-portal/apps` was reachable
unauthenticated in `kbagent serve --ui` mode (script/curl callers; browser
users with the session cookie were unaffected). Add `/dev-portal` to
`api_prefixes` and cover it with a 401-without-auth test.

* fix(dev-portal): evict stale cached bearer on auth error

The per-alias bearer cache is harmless in the CLI (service rebuilt per
invocation) but the `kbagent serve` ServiceRegistry is a long-lived
singleton, so a cached bearer outlives its portal-side TTL. Once expired,
every `_authed_client` call re-seeded the dead token and 401'd forever --
permanent lockout until restart.

`_authed_client` now drops `_bearers[alias]` on INVALID_TOKEN /
DP_LOGIN_FAILED so the next call re-authenticates. Regression test seeds a
stale bearer, asserts the 401 propagates AND the entry is evicted, then
that a follow-up call logs in fresh and succeeds.

* docs(dev-portal): document dry-run portal GET, type previews, version gate

Review polish (non-blocking):
- patch/publish `--dry-run` help + dev-portal-workflow.md now state that
  the preview still logs in and GETs the app (needs connectivity; a
  personal/MFA identity prompts for MFA) -- use a service account for a
  fully non-interactive preview.
- `_render_pending` / `_pending_as_json` drop their `# type: ignore`
  workarounds and annotate with `OutputFormatter` / `PendingWrite`
  (TYPE_CHECKING import).
- keboola-expert.md §1 VERSION GATE lists `dev-portal = 0.48.0+`; offset
  by trimming two now-deprecated `--hint client` prose mentions (stays
  under the 60 KB prompt budget).

* chore(dev-portal): bump to 0.49.0 (0.48.0 taken by keboola#356 feature flags)

keboola#356 (`kbagent feature` command group) merged to main as 0.48.0 and that
tag is cut, so dev-portal moves to its own 0.49.0 instead of colliding.

- pyproject / plugin.json / marketplace.json / uv.lock -> 0.49.0
- changelog: dev-portal entries live under a new 0.49.0 key, above main's
  0.48.0 (feature flags) block (done during the rebase)
- flip dev-portal "since v0.48.0" doc tags -> v0.49.0: AGENT_CONTEXT,
  commands-reference, gotchas, keboola-expert version-gate + matrix row

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Petr <petr@keboola.com>
) (keboola#363)

* feat(headless): token-only invocation via __env__ project (keboola#359)

Let a daemon / container / CI run kbagent with only a token in the
environment -- no `kbagent project add`, no config.json on disk.

Setting KBAGENT_PROJECT_FROM_ENV=1 together with KBC_TOKEN +
KBC_STORAGE_API_URL makes ConfigStore synthesize an in-memory project
under the reserved alias `__env__`. Because both the CLI and `kbagent
serve` resolve projects through the same ConfigStore.load() chokepoint,
a single env-injection covers both consumption styles:

    kbagent --json storage file-upload --project __env__ --file X
    kbagent serve   # POST endpoints take project=__env__

Security:
- The `__env__` project is marked `ephemeral` and stripped by
  ConfigStore.save(), so the env token is never persisted, even when a
  write op triggers a config.json write.
- Opt-in is the explicit flag, not the mere presence of KBC_TOKEN, to
  avoid a phantom project on a dev machine that exported KBC_TOKEN only
  for `project add`.
- Flag set but credentials missing -> fail fast (exit 5), not a silent
  skip.

Tests: 7 unit (test_config_store.py) + 3 E2E (test_e2e.py).
Docs: changelog, keboola-expert.md, gotchas.md, commands-reference.md,
context.py AGENT_CONTEXT, CLAUDE.md. Version 0.49.0 -> 0.50.0.

* feat(project): normalize stack URLs (bare host, deep-link, trailing slash)

UX follow-up on the headless mode. `KBC_STORAGE_API_URL` (and
`project add --url` / `project edit --url`) previously rejected anything
that was not already a clean `https://<host>` base -- a bare host like
`connection.keboola.com` raised a pydantic ValidationError traceback.

Add `normalize_stack_url()` as the single source of truth, used by the
ProjectConfig field validator (safety net + clean stored value) and by
ProjectService.add_project / edit_project (so token verification hits
the right host). It accepts:

  - bare host                connection.keboola.com
  - trailing slash           https://connection.keboola.com/
  - surrounding whitespace   (paste artifact)
  - full project deep-link   https://connection.keboola.com/admin/projects/10105/dashboard

and reduces every form to https://<host>. Explicit non-https schemes
(http://, file://, ftp://) are still rejected (SSRF / protocol-abuse
guard). An unusable URL in the headless `__env__` injection now raises a
clean ConfigError (exit 5) instead of a raw ValidationError traceback.

Tests: 6 new model tests + 2 new env-injection tests; updated the old
"reject no scheme" test to assert normalization. Full non-e2e suite:
3771 passed.

* fix(headless): recover __env__ project_id from token, drop fake name

`project list` showed `project_name="env (headless)"` and a null
Project ID for the env-injected project -- the fake name was misleading
and the ID was simply missing.

ConfigStore.load() must stay offline (it runs many times per command and
per serve request), so it cannot call verify_token to fetch the real
project name. But Keboola Storage tokens are `{projectId}-{tokenId}-
{secret}`, so the project_id is recovered offline from the token prefix.
The project_name is left blank (honest) instead of a fake placeholder;
`project status` / `project info` verify against the API and show the
real name when a command actually needs it.

Tests: assert project_id is parsed (901-...) and name is blank; a
non-numeric token prefix leaves project_id unset without crashing.

* review(keboola#363): version gate, guard __env__ mutations, service URL test

Address the kbagent-pr-reviewer findings on keboola#363:

- NB-1: add the 0.50.0 headless / URL-normalization entry to the Rule 6
  VERSION GATE in keboola-expert.md (highest silent-drift surface).
- NB-2: reject remove/edit/rename/set-branch on the env-synthesized
  __env__ project with a clear ConfigError instead of reporting a
  success that silently vanishes on the next load(). A real persisted
  project under the same alias (ephemeral=False) stays mutable.
- NIT-1: add a service-layer test asserting add_project() normalizes a
  bare-host / deep-link URL through normalize_stack_url() before the
  verification client and before persisting.

Tests: +5 (guard x2, service normalization x1, project_id parse x2 from
earlier). Full non-e2e suite: 3775 passed.

* fix(headless): skip org-info backfill for ephemeral __env__ project

Devin review flagged that `project status` in headless mode could write
a config.json to disk via `_backfill_org_info`: the __env__ project
always has empty org_id/org_name, so the backfill kept trying to persist
it. `save()` strips the ephemeral entry (so no token leaked), but the
file was still created -- breaking the "no config.json on disk" promise
-- and the futile backfill re-ran on every `project status`.

Skip ephemeral projects when building the backfill update set. When
__env__ is the only candidate, the update set stays empty and no file is
written at all.

Test: get_status() under env-injection leaves the config dir file-free.
Full non-e2e suite: 3776 passed.
…(0.50.0) (keboola#364)

* feat(stream): `kbagent stream` command group for Data Streams (OTLP) (0.50.0)

Add a `kbagent stream` command group so OpenTelemetry / OTLP Data Streams
sources can be provisioned and introspected from the CLI instead of
copy-pasting endpoints out of the Keboola UI (closes keboola#357).

Commands: `stream list`, `stream create-source`, `stream detail`,
`stream delete`.

Architecture:
- Stream control plane lives on a separate host derived from the project's
  Storage URL (connection.<region> -> stream.<region>, same scheme as
  ai./queue.) and authenticates with the per-project Storage token
  (X-StorageApi-Token) -- no manage token.
- The OTLP ingestion endpoint (stream-in.<region>/otlp/.../<secret>) is
  returned by the API in source.otlp.url (never derived) with the secret in
  the URL path -- masked by default in every surface, --reveal to print it.
- create-source --type otlp auto-provisions the logs/metrics/traces sinks
  (bucket in.c-otlp-<source>) so data actually lands in Storage, matching the
  UI; provisioning is idempotent and --no-sinks opts out.
- create/delete/sink-create are async Tasks polled to completion.

New layers: stream_client.py (StreamClient + create_sink + task polling),
services/stream_service.py (alias resolution, secret masking, detail
assembly, sink provisioning), commands/stream.py, server/routers/stream.py
(1:1 serve REST). Wired into cli.py, permissions.py (read/write/destructive),
constants.py, server dependencies/app.

Tests: test_stream_client.py (14), test_stream_service.py (16),
test_stream_cli.py (11); E2E test_stream_otlp_e2e (make test-e2e-stream).
Live-validated against a real project: create source -> 3 sinks -> POST
OTLP/HTTP logs -> 3 rows landed in in.c-otlp-<name>.logs -> read back via
workspace query.

Docs synced (CLAUDE.md, context.py AGENT_CONTEXT, keboola-expert.md,
SKILL.md, commands-reference.md, gotchas.md, new stream-workflow.md);
version 0.50.0 + version-sync.

* docs(stream): align CLAUDE.md delete flags notation with context.py (--yes|--force)

* docs(stream): trim keboola-expert OTLP matrix row to fit agent prompt budget after 0.50.0 merge
…eboola#365)

* fix(serve): document `stream` in OpenAPI + add Data Streams web UI

The `stream` router was wired via include_router and fully callable, but
its tag was missing from OPENAPI_TAGS in server/app.py -- so /docs#/stream
rendered as a bare, description-less section outside its logical group.
Add the tag entry under the Data group (next to storage), mirroring the
`kbagent stream *` CLI.

Add a Data Streams page to the web UI (NERD UI) with parity to the CLI and
backend: list sources, create an OTLP/HTTP source (sink auto-provisioning,
if-not-exists), a detail drawer with a secret reveal toggle for the OTLP
endpoint, and a destructive delete. Wire it into routing (App.tsx),
PageId state, and the Browse sidebar section.

Add a regression test asserting every operation tag has an OPENAPI_TAGS
description block, and add the stream path to the router smoke check, so a
newly added router can't ship invisible in /docs again.

* fix(ui): stringify stream branch ref to match the str-typed API contract

The stream control-plane API types its branch as a string, while the UI's
global branchId is a numeric Storage branch ID. Numeric IDs are valid refs
(test_branch_override in test_stream_service.py drives branch_id="1234"), so
this is not a value bug -- it makes the contract explicit instead of relying
on JSON/query coercion. Introduce a branchRef() helper used by all four
stream calls (list, detail, create, delete).

* chore(release): 0.51.0 -- Data Streams web UI + stream OpenAPI tag

Minor bump: the release adds a new user-visible surface (the Data Streams
web UI page) on top of the `stream` OpenAPI documentation fix. Add the
0.51.0 changelog entry and propagate the version to plugin.json and
marketplace.json via scripts/sync_version.py.

* fix(ui): address review -- surface delete errors, use buckets fallback

Follow-up on the kbagent-pr-reviewer pass over the Data Streams page:
- show deleteMu errors via ErrorBox -- delete failures were silent to the
  user, unlike the create mutation in the same file (in-file inconsistency)
- render destination.buckets as a fallback when a single bucket isn't set,
  so the multi-bucket field returned by the backend is actually used
- note that the sinks/source raw fields surface only via the Raw JSON tab
…--password-stdin (keboola#366)

* feat(dev-portal): admin-role PATCH routing + MFA fixes + interactive --password-stdin

Three independent fixes against the dev-portal surface that landed in keboola#354,
discovered while integrating ABRA Flexi (a real component registration on
production apps-api):

1. **Admin-role PATCH routing**. `complexity`, `categories`, `forwardToken`,
   `forwardTokenDetails`, `injectEnvironment`, `processTimeout`,
   `requiredMemory`, `features`, and `category` are `.forbidden()` on the
   apps-api vendor schema (`PATCH /vendors/{vendor}/apps/{app}`) -- but the
   server's error message is misleading: it says "must be one of: easy,
   medium, hard" because the enum-validation `.error()` annotation lives on
   the shared admin schema before `clientAppSchema()` overrides with
   `.forbidden()`. Source of truth: keboola/developer-portal:src/lib/
   validation.js -> clientAppSchema().

   Fix:
   - `DeveloperPortalIdentity.role_hint` becomes a real validator: only
     `vendor` (default) or `admin` accepted; case-folded; typos raise. The
     field is now load-bearing, not a free-text label.
   - `DeveloperPortalClient.patch_app` reads `self._identity.role_hint`
     and routes admin identities to `PATCH /admin/apps/{app}` (permissive
     adminAppSchema); vendor identities stay on the vendor endpoint.
   - `DeveloperPortalService.prepare_patch` preflights: vendor role +
     admin-only field => fail-fast `VALIDATION_ERROR` with a message
     that (a) names every offending field, (b) explains why the 422 is
     misleading, (c) tells the user the exact command to switch identity
     (`dev-portal identity add --role-hint admin ...`). Admin role bypasses
     the preflight entirely.
   - Reads, create, upload-icon, deprecate keep vendor-endpoint behaviour
     -- only PATCH has a meaningful admin variant on the server. Admin
     tokens still work on the vendor path for those (superset perms).

2. **MFA login: explicit `challenge` field + actual error surfaced**. User
   report from a Keboola-org TOTP account:
       MFA code: 521278
       Error: Developer Portal MFA login failed (HTTP 404)
   Root cause: the apiary spec calls `challenge` optional with default
   `SOFTWARE_TOKEN_MFA`, but in practice the server 404s when it's omitted.
   Sending it explicitly fixes it. Single attempt only: an earlier
   experiment retried with `SMS_MFA` on the same session, but
   `/auth/login` consumes the session, so the retry always 404'd with
   "Invalid code or auth state for the user", masking the real first
   failure (most often a stale 30-second TOTP code from waiting too long
   to enter it).

   The error now includes the server response body (truncated to 500
   chars) and a hint about TOTP code freshness, so users can tell whether
   the code was wrong, the session expired, or something else.

3. **`--password-stdin` no longer hangs interactively**. `sys.stdin.read()`
   waits for EOF, not Enter -- users who pasted a password and pressed
   Enter sat there until they Ctrl-C'd out. New `_read_password_stdin()`
   helper branches on `sys.stdin.isatty()`: TTY uses
   `getpass.getpass()` (hidden, line-based, Enter to confirm); pipe still
   does `read() -> strip()`. Both `identity add --password-stdin` and
   `identity edit --password-stdin` route through it. Help text updated
   to spell out the dual-mode behaviour.

Tests (10 new):
- TestReadPasswordStdin: TTY -> getpass, pipe -> read.
- TestLoginMfaPath::test_mfa_prompt_completes_login: now matches body
  including `challenge: SOFTWARE_TOKEN_MFA`.
- TestLoginMfaPath::test_mfa_failure_surfaces_server_body: real body
  bubbles up plus stale-TOTP hint.
- TestPortalWrites::test_patch_app_vendor_role_hits_vendor_endpoint
  + test_patch_app_admin_role_hits_admin_endpoint: confirm dispatch.
- TestDeveloperPortalIdentity::test_role_hint_accepts_admin
  + test_role_hint_normalises_case + test_role_hint_rejects_typo.
- TestReadsAndPrepareApply::test_prepare_patch_vendor_role_rejects_admin_only_fields
  + test_prepare_patch_admin_role_allows_admin_only_fields.

All 95 dev-portal tests pass; `make check` green (3827 / 8 skipped).

* test(dev-portal): use DP_MFA_CHALLENGE_TYPE constant in client tests

Replace the two hardcoded "SOFTWARE_TOKEN_MFA" match_json literals with the
DP_MFA_CHALLENGE_TYPE constant from constants.py, following through on the
NIT-1 constant extraction so the tests can't silently diverge from the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oola#367)

Bumps [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest) from 2.1.9 to 4.1.0.
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Changelog](https://github.com/vitest-dev/vitest/blob/main/docs/releases.md)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.0/packages/vitest)

---
updated-dependencies:
- dependency-name: vitest
  dependency-version: 4.1.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Transitive dependency bump (uv.lock only); kbagent has no direct urllib3 import. Pulled in dev-only via pip-audit -> requests. Addresses CVE-2026-44431 (header leak on cross-origin redirect) and CVE-2026-44432 (decompression DoS).
Transitive dependency bump (uv.lock only); kbagent has no direct idna import. In the runtime httpx/anyio path. No new CVEs in 3.11->3.15; routine maintenance + Unicode data update. Verified: full pytest suite green with urllib3+idna bumps combined.
…cs (0.52.0) (keboola#368)

* feat(storage): `clone-table` -- pull a prod table into a dev branch (0.52.0)

Adds `kbagent storage clone-table --project P --table-id ID --branch ID
[--dry-run]`, wrapping the Storage API
`POST /v2/storage/branch/{branch}/tables/{id}/pull` endpoint
(operationName `devBranchTablePull`).

Why: on `storage-branches` projects a dev branch reads production tables
transparently (copy-on-write) until the first write, so an in-branch
schema mutation -- `swap-tables`, dropping a column -- fails with a
misleading "bucket not found" until the table is materialized
branch-local. `clone-table` performs that materialization. It is the
blocking prerequisite for the typify-via-branch workflow on
storage-branches projects.

Implementation mirrors `swap-tables` across all layers:
- KeboolaClient.pull_table (async storage job, polled to completion)
- StorageService.clone_table (branch mandatory; exit 5 / ConfigError
  before any HTTP when no branch is set)
- commands/storage.py clone-table (permission class `write`; --dry-run;
  no --yes since it never deletes)
- permissions, hint, serve REST route, AGENT_CONTEXT

Live-validated against project 10539 (storage-branches ON): clone a prod
table into a dev branch -> table materialized -> in-branch swap-tables
then succeeds (it previously failed with "bucket not found") -> the
production table is left untouched.

Tests: tests/test_storage_clone.py (13: client/service/CLI) +
tests/test_e2e.py::TestE2EStorageCloneTable (3). Docs synced per
convention keboola#17 (auto-generated SKILL.md, commands-reference, gotchas,
context, CLAUDE.md, storage-types + typify workflow). Deliberately not
added to keboola-expert.md (already at its hard token budget); covered
in the other surfaces.

Addresses the clone-prod-table-into-branch request in keboola#362.

* docs(typify): rewrite for dev-branch-rehearsal + prod-swap; fix false "rejects on production" (keboola#362)

Dev-branch merge propagates only configurations, NOT storage table
schema (confirmed by the storage-branches design + Keboola public docs,
and reproduced live). Two things were documented wrong:

1. typify-table-workflow.md claimed merge promotes the swapped/typed
   schema to production. It does not. Reworked into a two-stage model:
   rehearse in a dev branch (profile, build, swap, validate downstream),
   then repeat the real build + swap in the production (default) branch.
   Removed the bogus Phase 8 "merge promotes to prod"; added the prod
   execution with its inconsistency-window + rollback cautions.

2. swap-tables docstrings / command help / hint / context / gotchas /
   storage-types-workflow all claimed "the Storage API rejects this on
   production". It does not -- a default-branch swap is verified to work
   (project 10539) and is the supported way to retype a prod table.
   Corrected the wording across all surfaces.

No code-behavior change: branch_id is still mandatory and the swap is
still branch-scoped -- only the documentation/docstrings were wrong.
Added a 0.52.0 changelog entry for the correction (the historical 0.28.0
entry is left as-is). Completes the A+B half of keboola#362.

* docs+test: address PR keboola#368 review (NB-1 swap semantics, NB-2 test name)

NB-1: two keboola-expert.md matrix rows still described `swap-tables` as
dev-branch-only -- corrected to "any branch (incl. prod)", consistent
with the A+B semantics fix elsewhere in this PR. Net +4 bytes; the prompt
stays under its 62000-byte budget (the clone-table version-gate line is
still omitted for budget, as noted in the review).

NB-2: renamed test_url_encoding_for_special_characters ->
test_dotted_table_id_passed_verbatim_in_path (clone + swap dvojče). Dots
and dashes are RFC 3986 unreserved, so quote(..., safe="") does not
percent-encode them; the test verifies verbatim path pass-through, not
encoding. Docstring corrected to say so.

* fix: address Devin review on PR keboola#368 (remove deprecated --hint, add VERSION GATE)

1. clone-table wrongly added --hint support. CONTRIBUTING.md (since
   v0.45.0) forbids new hints/definitions entries and should_hint(ctx)
   short-circuits for new commands -- swap-tables (0.28.0, pre-deprecation)
   was mirrored too literally. Removed the should_hint/emit_hint block from
   storage_clone_table and the HintRegistry.register entry for
   storage.clone-table, matching stream (0.50.0) / feature (0.48.0) which
   carry no hint support.

2. Added the VERSION GATE entry `storage clone-table = 0.52.0+` to
   keboola-expert.md (CONTRIBUTING.md mandates it for new min-version
   commands). Freed budget by tightening the keboola#245 line so the prompt stays
   under its 62000-byte cap.

* docs: add (since v0.52.0) tag to new gotchas merge section (PR keboola#368 review B-1)

The "Dev-branch merge carries only configurations" gotcha used a bare
(verified 2026-06-01) stamp. CONTRIBUTING.md (convention keboola#17) requires the
(since vX.Y.Z) tag on every gotcha so AI agents don't recommend behavior
documentation that predates the install. Now reads
"(since v0.52.0, verified 2026-06-01)".

* docs(expert): add clone-table gotcha + trim stale content (PR keboola#368 NB-1/NB-2/NIT-1)

Delta-review follow-up on keboola-expert.md (deferred from this PR for
token budget):
- NB-1: added a §3 inline gotcha for the storage-branches copy-on-write
  trap -- an in-branch swap-tables / column-drop fails "bucket not found"
  until `clone-table` materializes the prod table branch-local.
- NIT-1: removed the dangling (§14.3) cross-reference (no such section)
  and the deprecated --hint alternative in the Retype matrix row.
- NB-2: trimmed the verbose semantic-layer "short form" (full prose
  already lives in gotchas.md) and tightened the auto-materialize entry.
  Headroom against the 62000-byte budget went from 7 to 609 bytes.
Dev-tooling dependency via pip-audit chain; no runtime impact on the CLI.
Runtime dep in the server extra. 0.0.27 adds multipart header limits (DoS hardening) and raises the floor to >=0.0.27.
…boola#373)

* fix(storage): correct stale "dev branch only" swap-tables wording

Follow-up to keboola#368. The swap-tables semantics correction (0.52.0) left four
co-located surfaces still claiming the swap is dev-branch-only / rejected on
production -- now false after that fix:

- services/storage_service.py: the ConfigError raised on a missing --branch
  still said "The Storage API rejects this on production" (user-facing at
  exit code 5, directly contradicting the corrected docstring)
- references/commands-reference.md, commands/storage.py docstring (-> SKILL.md
  via make skill-gen), commands/context.py (AGENT_CONTEXT): "in a dev branch"
  / "in a development branch"

A swap on the default/production branch is supported -- it is how a typed
rebuild is applied to prod. branch_id stays mandatory; this is wording only,
no behavior change. The dependent test match ("dev branch" -> "requires a
branch") and a misleading test docstring were updated accordingly.

Found by the kbagent-pr-reviewer third pass on keboola#368 (NB-1/2/3 + NIT-1).

* fix(storage): address keboola#373 review -- remaining stale "dev branch" swap text

Two surfaces the first pass of this PR missed, flagged by kbagent-pr-reviewer:

- B-1 (blocking): CLI test test_swap_missing_branch_fails_clearly mocked the
  OLD "swap-tables requires a dev branch" string as its side-effect and
  asserted on it. The mock short-circuits the real service, so the test was
  validating phantom text that no longer matches service output. Updated the
  mock + assertion to "requires a branch".
- NB-1: the swap_tables Args docstring still read "branch_id: Dev branch ID";
  corrected to "any branch accepted, including the default/production branch".

clone-table wording left intact (clone legitimately targets a dev branch; its
service message and tests correctly keep "requires a dev branch").
)

Patch release over 0.52.0. Completes the swap-tables wording correction (keboola#373) and bundles the pip 26.1 / python-multipart 0.0.27 dependency bumps (keboola#371/keboola#372). No behaviour change.
…rruption (0.53.0) (keboola#376)

* fix(sync): conflict-aware `pull --force`, no more silent baseline corruption (0.53.0)

`sync pull --force` could silently strand un-pushed local edits. When a config
had local edits and its remote was unchanged, `--force` bypassed the
locally-modified guard, hit the `remote_unchanged` short-circuit, and re-stamped
the manifest `pull_hash` from the *edited* on-disk file. Afterwards `sync diff`
and `sync push` reported "in sync" and shipped nothing while the remote still
held the old config -- data loss with no signal. The silent part was compounded
by the diff `local_override_hashes` optimization, which then never even read the
edited file's content.

Fix splits `--force` by 3-way diff state (config and row granularity):
- local edited, remote UNCHANGED   -> preserve file + 3-way base (pending delta
  stays visible to `sync push`; no discard, no silent re-stamp)
- local edited, remote ALSO changed -> abort before writing anything
  (SyncConflictError, exit 1, code SYNC_CONFLICT) so the user resolves it
- local untouched, remote changed   -> take remote (unchanged behavior)

Consequence: `--force` no longer discards non-conflicting local edits. To drop
local edits on purpose, delete the file/dir and pull.

New: errors.SyncConflictError + ErrorCode.SYNC_CONFLICT; read-only conflict
pre-pass SyncService._detect_force_pull_conflicts / _is_conflict (runs before
any write); commands/sync.py prints a red per-config conflict block (human) or a
SYNC_CONFLICT envelope with details.conflicts (--json). --all-projects surfaces
a per-project conflict as that project's error without aborting the batch.

Tests: tests/test_sync_force_pull_baseline.py (config + row; preserve case b,
abort case a, remote-only-changed takes remote) and tests/test_sync_cli.py
(exit 1 + human/JSON envelope). Live-verified read-only against project 1183:
force-pull preserves a locally-edited transformation ("Skipped (1) locally
modified") and `sync diff` still reports modified:1.

Docs: changelog 0.53.0, version bump + make version-sync, CLAUDE.md sync group,
context.py AGENT_CONTEXT, sync-workflow.md, gotchas.md (since v0.53.0),
commands-reference.md, keboola-expert.md.

* fix(sync): structured conflict in `pull --all-projects` + E2E force-pull coverage

Addresses kbagent-pr-reviewer findings on keboola#376.

NB-1: `pull_all()` flattened `SyncConflictError` to `str(exc)`, dropping the
`SYNC_CONFLICT` code and the conflicts list, so a `--all-projects --json`
consumer (AI agent / script) could not tell a merge conflict from any other
error. `pull_all._worker` now catches `SyncConflictError` and stores
`{error, error_code, conflicts}`, mirroring the single-project envelope.
Unit test in `tests/test_sync_force_pull_baseline.py`.

NB-2: adds `tests/test_e2e.py::TestE2ESyncWorkflow::test_sync_force_pull_conflict_aware`
-- end-to-end against real Storage: create a config, edit it locally, force-pull
with the remote unchanged (assert the edit is preserved and `sync diff` still
reports it modified), then mutate the remote and force-pull again (assert exit 1
and `SYNC_CONFLICT`, with the conflict listed). Cleans up the config.

* docs(changelog): note --all-projects structured conflict + E2E coverage (0.53.0)

* refactor(sync): tighten conflict types + guard _format_conflict_list (NIT-1/NIT-2)
…s, etc.)

Teach kbagent the new `semantic-reference-data` metastore type — a
per-dimension member store (one record per dimension, members in a
`members[]` array). The driving use case: hold a Chart of Accounts (the
account list + all attributes) in the semantic layer instead of a hardcoded
Storage table.

New self-contained sub-app `kbagent semantic-layer reference-data`:
- `list`   — dimension summaries (id, dimension, member_count) [read]
- `get`    — one record + all members, by --id or --model+--dimension [read]
- `set`    — create-or-replace by (model, dimension) from a JSON members
             file; idempotent (existing record -> PUT/revision++, else POST) [write]
- `delete` — remove a record by UUID [destructive]

Implementation:
- metastore_client: register `semantic-reference-data` in SemanticType /
  SEMANTIC_TYPES; add a `put_item` verb so `set` uses the metastore's real
  revisioned PUT (preserves history) instead of DELETE+POST.
- SemanticLayerService: list/get/set/delete via the generic client verbs.
- permissions + hints registered for all four leaves; SKILL.md regenerated.

Deliberately self-contained: reference-data is NOT AI-generated, so it is
kept out of `build` / `export` / `diff` / cascade / PUSH_ORDER — zero blast
radius on the existing model flows.

Tests: service (CRUD, create-vs-replace, validation, NOT_FOUND) + CLI
(list/get/set/delete, bad-JSON, --yes gate) + permission-registry asserts.

Deferred (flagged for follow-up): REST `serve` router parity, the
hand-written plugin-doc surfaces (commands-reference / gotchas /
keboola-expert / context.py / CLAUDE.md), and an E2E hop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ottomansky
Copy link
Copy Markdown
Owner Author

Re-targeting to the upstream repo keboola/cli (this fork base was wrong). Superseded by the keboola/cli PR.

@ottomansky ottomansky force-pushed the feat/semantic-layer-reference-data branch from 494cc07 to 644a8cc Compare June 3, 2026 12:47
@ottomansky ottomansky closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants