From a67b5bc5032a6310c321dd319c5509ea1c1eca7b Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 21:13:51 +0200 Subject: [PATCH 1/6] feat(storage): `clone-table` -- pull a prod table into a dev branch (0.52.0) Adds `kbagent storage clone-table --project P --table-id ID --branch ID [--dry-run]`, wrapping the Storage API `POST /v2/storage/branch/{branch}/tables/{id}/pull` endpoint (operationName `devBranchTablePull`). Why: on `storage-branches` projects a dev branch reads production tables transparently (copy-on-write) until the first write, so an in-branch schema mutation -- `swap-tables`, dropping a column -- fails with a misleading "bucket not found" until the table is materialized branch-local. `clone-table` performs that materialization. It is the blocking prerequisite for the typify-via-branch workflow on storage-branches projects. Implementation mirrors `swap-tables` across all layers: - KeboolaClient.pull_table (async storage job, polled to completion) - StorageService.clone_table (branch mandatory; exit 5 / ConfigError before any HTTP when no branch is set) - commands/storage.py clone-table (permission class `write`; --dry-run; no --yes since it never deletes) - permissions, hint, serve REST route, AGENT_CONTEXT Live-validated against project 10539 (storage-branches ON): clone a prod table into a dev branch -> table materialized -> in-branch swap-tables then succeeds (it previously failed with "bucket not found") -> the production table is left untouched. Tests: tests/test_storage_clone.py (13: client/service/CLI) + tests/test_e2e.py::TestE2EStorageCloneTable (3). Docs synced per convention #17 (auto-generated SKILL.md, commands-reference, gotchas, context, CLAUDE.md, storage-types + typify workflow). Deliberately not added to keboola-expert.md (already at its hard token budget); covered in the other surfaces. Addresses the clone-prod-table-into-branch request in keboola/cli#362. --- .claude-plugin/marketplace.json | 2 +- CLAUDE.md | 1 + plugins/kbagent/.claude-plugin/plugin.json | 2 +- plugins/kbagent/skills/kbagent/SKILL.md | 1 + .../kbagent/references/commands-reference.md | 1 + .../skills/kbagent/references/gotchas.md | 24 ++ .../references/storage-types-workflow.md | 17 + .../references/typify-table-workflow.md | 11 + pyproject.toml | 2 +- src/keboola_agent_cli/changelog.py | 3 + src/keboola_agent_cli/client.py | 30 ++ src/keboola_agent_cli/commands/context.py | 6 + src/keboola_agent_cli/commands/storage.py | 97 +++++ .../hints/definitions/storage.py | 38 ++ src/keboola_agent_cli/permissions.py | 3 + .../server/routers/storage.py | 27 ++ .../services/storage_service.py | 66 +++ tests/test_e2e.py | 161 +++++++ tests/test_storage_clone.py | 394 ++++++++++++++++++ uv.lock | 2 +- 20 files changed, 884 insertions(+), 4 deletions(-) create mode 100644 tests/test_storage_clone.py diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 870be9b0..9e153e23 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -10,7 +10,7 @@ "plugins": [ { "name": "kbagent", - "version": "0.51.1", + "version": "0.52.0", "source": "./plugins/kbagent", "description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces", "category": "development" diff --git a/CLAUDE.md b/CLAUDE.md index db2c8397..a95271fe 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -355,6 +355,7 @@ kbagent storage truncate-table --project NAME --table-id ID [--table-id ...] [-- kbagent storage delete-column --project NAME --table-id ID --column COL [--column ...] [--force] [--dry-run] [--yes] [--branch ID] kbagent storage delete-bucket --project NAME --bucket-id ID [--bucket-id ...] [--force] [--dry-run] [--yes] [--branch ID] kbagent storage swap-tables --project NAME --table-id ID --target-table-id ID --branch ID [--dry-run] [--yes] +kbagent storage clone-table --project NAME --table-id ID --branch ID [--dry-run] kbagent storage describe-bucket --project NAME --bucket-id ID [--text STR | --file PATH | --stdin] [--branch ID] kbagent storage describe-table --project NAME --table-id ID [--text STR | --file PATH | --stdin] [--branch ID] kbagent storage describe-column --project NAME --table-id ID --column NAME=DESC [--column ...] [--branch ID] diff --git a/plugins/kbagent/.claude-plugin/plugin.json b/plugins/kbagent/.claude-plugin/plugin.json index aa279f65..fb3ba50c 100644 --- a/plugins/kbagent/.claude-plugin/plugin.json +++ b/plugins/kbagent/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "kbagent", - "version": "0.51.1", + "version": "0.52.0", "description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces", "author": { "name": "Keboola", diff --git a/plugins/kbagent/skills/kbagent/SKILL.md b/plugins/kbagent/skills/kbagent/SKILL.md index b026dccc..8b1c4055 100644 --- a/plugins/kbagent/skills/kbagent/SKILL.md +++ b/plugins/kbagent/skills/kbagent/SKILL.md @@ -194,6 +194,7 @@ When working inside a git repository or project directory, run `kbagent init` (o | Truncate (delete all rows from) one or more storage tables | `kbagent storage truncate-table --project PROJECT --table-id TABLE-ID` | | Delete one or more columns from a storage table | `kbagent storage delete-column --project PROJECT --table-id TABLE-ID --column COLUMN` | | Swap two storage tables in a development branch | `kbagent storage swap-tables --project PROJECT --table-id TABLE-ID --target-table-id TARGET-TABLE-ID` | +| Clone (pull) a production table into a development branch | `kbagent storage clone-table --project PROJECT --table-id TABLE-ID` | | Delete one or more storage buckets | `kbagent storage delete-bucket --project PROJECT --bucket-id BUCKET-ID` | | Set the description on a storage bucket | `kbagent storage describe-bucket --project PROJECT --bucket-id BUCKET-ID` | | Set the description on a storage table | `kbagent storage describe-table --project PROJECT --table-id TABLE-ID` | diff --git a/plugins/kbagent/skills/kbagent/references/commands-reference.md b/plugins/kbagent/skills/kbagent/references/commands-reference.md index c29f37aa..bdb633c2 100644 --- a/plugins/kbagent/skills/kbagent/references/commands-reference.md +++ b/plugins/kbagent/skills/kbagent/references/commands-reference.md @@ -102,6 +102,7 @@ Requires a **super-admin** Manage API token (same kind as `org setup`). Same def - `storage delete-column --project NAME --table-id ID --column COL [--column ...] [--force] [--dry-run] [--yes] [--branch ID]` -- delete columns from a table (branch-aware) - `storage delete-bucket --project NAME --bucket-id ID [--bucket-id ...] [--force] [--dry-run] [--yes] [--branch ID]` -- delete buckets (branch-aware) - `storage swap-tables --project NAME --table-id ID --target-table-id ID --branch ID [--dry-run] [--yes]` (since v0.28.0) -- swap two storage tables in a dev branch (POST `/tables/{id}/swap`). Both tables exchange physical positions; aliases are NOT transferred (they keep pointing at the same physical position and therefore expose the OTHER table's data after the swap). Service refuses without a branch (active branch via `branch use` works too). Use to flip a typed rebuild ("data_change_log") into the original name ("data") without touching downstream config references +- `storage clone-table --project NAME --table-id ID --branch ID [--dry-run]` (since v0.52.0) -- pull (clone) a production table into a dev branch (POST `/tables/{id}/pull`, operationName `devBranchTablePull`). On `storage-branches` projects a dev branch reads prod tables transparently until the first write, so an in-branch schema mutation (`swap-tables`, dropping a column) fails with a misleading "bucket not found" until the table is materialized branch-local; `clone-table` does that. One-way (default -> branch). Service refuses without a branch (active branch via `branch use` works too). Permission class `write` - `storage describe-bucket --project NAME --bucket-id ID [--text STR | --file PATH | --stdin] [--branch ID]` -- set a bucket description (stored as `KBC.description` in bucket metadata, upsert). Provide exactly one of `--text`, `--file`, `--stdin`. Read back via `storage bucket-detail` - `storage describe-table --project NAME --table-id ID [--text STR | --file PATH | --stdin] [--branch ID]` -- set a table description (stored as `KBC.description` in table metadata, upsert). Provide exactly one of `--text`, `--file`, `--stdin`. Read back via `storage table-detail` - `storage describe-column --project NAME --table-id ID --column NAME=DESCRIPTION [--column ...] [--branch ID]` -- set one or more column descriptions. Stored as `KBC.column.{name}.description` keys in the table's metadata (Keboola has no user-writable column-metadata endpoint). Read back in `storage table-detail` under `column_details[].description` diff --git a/plugins/kbagent/skills/kbagent/references/gotchas.md b/plugins/kbagent/skills/kbagent/references/gotchas.md index 3778f249..eda47be9 100644 --- a/plugins/kbagent/skills/kbagent/references/gotchas.md +++ b/plugins/kbagent/skills/kbagent/references/gotchas.md @@ -784,6 +784,30 @@ events and emits a final `done` SSE frame mirroring the same record. original table now carries the typed schema with no downstream config rewrite required. +## `storage clone-table` materializes a prod table into a dev branch (since v0.52.0) + +- `kbagent storage clone-table --project P --table-id T --branch ` + pulls a production table into a dev branch + (`POST /v2/storage/branch/{branch}/tables/{id}/pull`, operationName + `devBranchTablePull` -- the same call the platform issues on a branch's + first write to a prod table). +- **Why it matters on `storage-branches` projects:** a dev branch reads + production tables transparently (copy-on-write) until the first write. + A schema mutation in the branch -- `swap-tables`, dropping a column -- + targets a table that is not yet branch-local, so the Storage API fails + with a misleading `"bucket ... was not found in the project"`. Run + `clone-table` first to materialize the table branch-local; the swap / + drop then succeeds. (Verified live 2026-06-01 on project 10539 with + storage-branches ON: clone -> in-branch swap succeeds; production left + untouched.) +- **One-way (default -> branch).** There is no "push branch -> default": + branch storage is never merged back to production (only configurations + are). The pull is the only API path between the two table stores. +- Branch is mandatory: the service refuses with exit 5 (`ConfigError`) + before any HTTP call when neither `--branch` nor an active branch (via + `branch use`) is set. +- Permission class: `write` (creates a branch-local copy; never deletes). + ## `storage truncate-table` preserves schema; endpoint is uniformly async-via-job (since v0.32.0) - `kbagent storage truncate-table --project P --table-id T [--branch ID] diff --git a/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md b/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md index d62a6719..2fb0355b 100644 --- a/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md +++ b/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md @@ -254,6 +254,17 @@ kbagent workspace query --workspace-id W --sql " " # (or use kbagent storage create-table + an SQL transformation) +# 2b. storage-branches projects only: the dev branch reads 'data' +# transparently until first write, so swap (a write) fails with a +# misleading "bucket not found" until 'data' is materialized +# branch-local. Pull it in first. (data_change_log, built by the +# in-branch CTAS above, is already branch-local. Skip on +# legacy-branch projects.) +kbagent storage clone-table \ + --project prod \ + --table-id in.c-foo.data \ + --branch + # 3. Swap: the typed copy becomes 'data', the typeless original moves # to 'data_change_log'. Aliases stay put -- they expose the OTHER # table's data after the swap. @@ -268,6 +279,12 @@ kbagent branch merge --project prod --branch ``` Rules: +- **storage-branches projects:** `swap-tables` operates on branch-local + tables. The original (`in.c-foo.data`) is read transparently from prod + until first write, so the swap fails with a misleading "bucket not + found" until you `clone-table` it into the branch (step 2b). The typed + sibling built by the in-branch CTAS is already branch-local. Legacy + fake-branch projects don't need this. - The Storage API rejects this on production. The service refuses with exit 5 (`ConfigError`) before any HTTP if `--branch` is missing AND no active branch is set via `branch use`. diff --git a/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md b/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md index 91c2fdd8..91364d56 100644 --- a/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md +++ b/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md @@ -268,6 +268,17 @@ emits backtick-quoted `\`dataset\`.\`table\`` paths since v0.25.3). ## Phase 5 -- Swap ```bash +# 5.0. storage-branches projects ONLY: the swap is a write, and the dev +# branch still reads the original 'data' transparently from prod, so +# the swap fails with a misleading "bucket not found" until 'data' is +# materialized branch-local. Pull it in first. ('data_typed', built +# in Phase 3, is already branch-local.) Skip on legacy-branch projects +# -- check with: kbagent project info --project ALIAS | grep storage-branches +kbagent --json storage clone-table \ + --project ALIAS \ + --table-id in.c-foo.data \ + --branch + # 5a. Dry-run first. Should report dry_run: true, never call the API. kbagent --json storage swap-tables \ --project ALIAS \ diff --git a/pyproject.toml b/pyproject.toml index c6837281..bf44d5b8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "keboola-agent-cli" -version = "0.51.1" +version = "0.52.0" description = "AI-friendly CLI for managing Keboola projects" readme = "README.md" requires-python = ">=3.12" diff --git a/src/keboola_agent_cli/changelog.py b/src/keboola_agent_cli/changelog.py index 84ebb708..3b240406 100644 --- a/src/keboola_agent_cli/changelog.py +++ b/src/keboola_agent_cli/changelog.py @@ -8,6 +8,9 @@ # Ordered newest-first. Each value is a list of brief one-line descriptions. CHANGELOG: dict[str, list[str]] = { + "0.52.0": [ + 'New: `kbagent storage clone-table --project P --table-id ID --branch ID [--dry-run]` -- pulls (clones) a production table into a development branch via the Storage API `POST /v2/storage/branch/{branch}/tables/{id}/pull` endpoint (operationName `devBranchTablePull`, the same call the platform issues on a branch\'s first write to a prod table). On `storage-branches` projects a dev branch reads production tables transparently (copy-on-write) until the first write, so a schema mutation in the branch -- `swap-tables`, dropping a column -- fails with a misleading "bucket not found" until the table is materialized branch-local. `clone-table` performs that materialization. The pull is one-way (default -> branch); the service refuses with exit 5 (`ConfigError`) before any HTTP call when neither `--branch` nor an active branch (via `kbagent branch use`) is set. The API returns a queued storage job which the client polls to completion before returning, mirroring `swap-tables` semantics. Permission class: `write` (creates a branch-local copy; never deletes). New layers: `KeboolaClient.pull_table`, `StorageService.clone_table`, `commands/storage.py` `clone-table`, hint `storage.clone-table`, and a 1:1 `kbagent serve` REST route (`POST /storage/tables/{project}/{table_id}/pull`). Tests: `tests/test_storage_clone.py` (13: client/service/CLI) + `tests/test_e2e.py::TestE2EStorageCloneTable` (3). Live-validated against project 10539 (storage-branches ON): clone a prod table into a dev branch -> table materialized -> in-branch `swap-tables` then succeeds (it previously failed with "bucket not found") -> production left untouched. Addresses the clone-prod-table-into-branch request in keboola/cli#362.', + ], "0.51.1": [ "Fix (dev-portal): admin-role PATCH routing. `complexity`, `categories`, `forwardToken`, `forwardTokenDetails`, `injectEnvironment`, `processTimeout`, `requiredMemory`, `features`, and `category` are `.forbidden()` on the apps-api vendor schema (`clientAppSchema` in keboola/developer-portal:src/lib/validation.js) but settable on the admin schema. The vendor PATCH returns a misleading 422 (`Parameter complexity must be one of: easy, medium, hard`) because the enum-validation `.error()` annotation is attached on the shared admin schema before `clientAppSchema()` overrides with `.forbidden()`. `DeveloperPortalIdentity.role_hint` becomes a real validator (`vendor`/`admin`, case-folded, typos raise); `DeveloperPortalClient.patch_app` now reads the role and routes admin identities to `PATCH /admin/apps/{app}` (permissive schema); `DeveloperPortalService.prepare_patch` preflights vendor-role + admin-only-field combinations with a fail-fast error that names every offending field, explains why the 422 is misleading, and tells the user the exact command to switch identity. Admin role bypasses the preflight entirely. Reads, create, upload-icon, deprecate keep vendor-endpoint behaviour -- only PATCH has a meaningful admin variant on the server.", "Fix (dev-portal): MFA login. The apiary spec calls `challenge` optional with default `SOFTWARE_TOKEN_MFA`, but in practice the server 404s on personal-account TOTP logins when it is omitted -- users saw `Error: Developer Portal MFA login failed (HTTP 404)` with no diagnostic body. The field is now sent explicitly. Single attempt only: an earlier experiment retried with `SMS_MFA` on the same session, but `/auth/login` consumes the session, so the retry always 404'd with `Invalid code or auth state for the user` and masked the real first failure (most often a stale 30-second TOTP code). The raised `KeboolaApiError` now includes the server response body (truncated to 500 chars) plus a hint about TOTP rotation, so users can distinguish wrong-code from stale-code from expired-session.", diff --git a/src/keboola_agent_cli/client.py b/src/keboola_agent_cli/client.py index 43d179fd..204ffbde 100644 --- a/src/keboola_agent_cli/client.py +++ b/src/keboola_agent_cli/client.py @@ -1804,6 +1804,36 @@ def swap_tables( response = self._request("POST", f"{prefix}/tables/{safe_id}/swap", json=body) return self._wait_for_storage_job(response.json()) + def pull_table(self, table_id: str, branch_id: int) -> dict[str, Any]: + """Pull (clone) a table from the default branch into a dev branch. + + On ``storage-branches`` projects a dev branch reads production tables + transparently (copy-on-write) until the first write. Operations that + mutate a table in the branch -- such as ``swap_tables`` or a column + drop -- require a branch-local materialization of the table first; + otherwise the Storage API reports the bucket as "not found" in the + branch. This endpoint performs that materialization: it copies the + table from the default (production) branch into the branch's isolated + storage. It is the same call the platform issues on a branch's first + write to a production table. + + The pull is one-way (default -> branch). The API returns a queued + storage job which this method polls to completion before returning, + mirroring ``swap_tables`` semantics. + + Args: + table_id: Full ID of the table to pull (e.g. "in.c-bucket.table"). + branch_id: Target development branch ID. The source is always the + default/production branch. + + Returns: + Completed storage job dict. + """ + prefix = f"/v2/storage/branch/{branch_id}" + safe_id = quote(table_id, safe="") + response = self._request("POST", f"{prefix}/tables/{safe_id}/pull") + return self._wait_for_storage_job(response.json()) + def list_tables_with_metadata(self) -> list[dict[str, Any]]: """List all storage tables with columns and metadata. diff --git a/src/keboola_agent_cli/commands/context.py b/src/keboola_agent_cli/commands/context.py index 05255bea..057e0c11 100644 --- a/src/keboola_agent_cli/commands/context.py +++ b/src/keboola_agent_cli/commands/context.py @@ -401,6 +401,12 @@ touching downstream config references. Storage API rejects this on production: --branch (or active branch via 'kbagent branch use') is mandatory. Service guards before any HTTP call when no branch is set. + kbagent storage clone-table --project NAME --table-id ID --branch ID [--dry-run] + Clone (pull) a production table into a dev branch (POST /tables/{id}/pull). On storage-branches projects a + dev branch reads prod tables transparently until first write, so mutating a table's schema in the branch + (swap-tables, dropping columns) first needs a branch-local copy. This materializes that copy (one-way: + default -> branch). Branch is mandatory; service guards before any HTTP call when no branch is set. + ### Storage Descriptions kbagent storage describe-bucket --project NAME --bucket-id ID [--text STR | --file PATH | --stdin] [--branch ID] diff --git a/src/keboola_agent_cli/commands/storage.py b/src/keboola_agent_cli/commands/storage.py index 624d3d17..b67525f5 100644 --- a/src/keboola_agent_cli/commands/storage.py +++ b/src/keboola_agent_cli/commands/storage.py @@ -1423,6 +1423,103 @@ def storage_swap_tables( ) +@storage_app.command("clone-table", rich_help_panel=_TABLES) +def storage_clone_table( + ctx: typer.Context, + project: str = typer.Option( + ..., + "--project", + help="Project alias", + ), + table_id: str = typer.Option( + ..., + "--table-id", + help="Table ID to pull into the branch (e.g. 'in.c-bucket.table')", + ), + branch: int | None = typer.Option( + None, + "--branch", + help=( + "Target dev branch ID. Required; defaults to the active branch " + "set via 'kbagent branch use'. The pull is one-way: default -> branch." + ), + ), + dry_run: bool = typer.Option( + False, + "--dry-run", + help="Show what would be pulled without executing", + ), +) -> None: + """Clone (pull) a production table into a development branch. + + On storage-branches projects a dev branch reads production tables + transparently until the first write. To mutate a table's schema in the + branch -- e.g. 'swap-tables' or dropping a column -- you first need a + branch-local copy of the production table; without it the Storage API + reports the bucket as "not found" in the branch. This materializes that + copy from the default branch (one-way: default -> branch). + + \b + Example: + kbagent branch use --project P --branch 1234 + kbagent storage clone-table --project P --table-id in.c-foo.data + kbagent storage swap-tables --project P \\ + --table-id in.c-foo.data --target-table-id in.c-foo.data_typed + """ + if should_hint(ctx): + emit_hint( + ctx, + "storage.clone-table", + project=project, + table_id=table_id, + branch=branch, + dry_run=dry_run, + ) + + formatter = get_formatter(ctx) + service = get_service(ctx, "storage_service") + config_store: ConfigStore = ctx.obj["config_store"] + _, effective_branch = resolve_branch(config_store, formatter, project, branch) + + try: + result = service.clone_table( + alias=project, + table_id=table_id, + branch_id=effective_branch, + dry_run=dry_run, + ) + except ConfigError as exc: + formatter.error(message=exc.message, error_code=ErrorCode.CONFIG_ERROR) + raise typer.Exit(code=5) from None + except KeboolaApiError as exc: + exit_code = map_error_to_exit_code(exc) + formatter.error( + message=exc.message, + error_code=exc.error_code, + project=project, + retryable=exc.retryable, + ) + raise typer.Exit(code=exit_code) from None + + if dry_run: + if formatter.json_mode: + formatter.output(result) + else: + formatter.console.print( + f"[bold blue]Would clone (branch {result['branch_id']}):[/bold blue] " + f"{result['table_id']} (default -> branch)" + ) + return + + if formatter.json_mode: + formatter.output(result) + else: + formatter.console.print( + f"[bold green]Cloned:[/bold green] {result['table_id']} " + f"into branch {result['branch_id']}" + ) + + @storage_app.command("delete-bucket", rich_help_panel=_BUCKETS) def storage_delete_bucket( ctx: typer.Context, diff --git a/src/keboola_agent_cli/hints/definitions/storage.py b/src/keboola_agent_cli/hints/definitions/storage.py index ccc737ab..dba5a368 100644 --- a/src/keboola_agent_cli/hints/definitions/storage.py +++ b/src/keboola_agent_cli/hints/definitions/storage.py @@ -500,6 +500,44 @@ ) ) +# ── storage clone-table ─────────────────────────────────────────── + +HintRegistry.register( + CommandHint( + cli_command="storage.clone-table", + description="Clone (pull) a production table into a dev branch", + steps=[ + HintStep( + comment="Materialize a production table into the dev branch (one-way: default -> branch)", + client=ClientCall( + method="pull_table", + args={ + "table_id": "{table_id}", + "branch_id": "{branch}", + }, + result_var="result", + ), + service=ServiceCall( + service_class="StorageService", + service_module="storage_service", + method="clone_table", + args={ + "alias": "{project}", + "table_id": "{table_id}", + "branch_id": "{branch}", + "dry_run": "{dry_run}", + }, + ), + ), + ], + notes=[ + "Required before swap-tables / column drops on storage-branches projects: a dev branch reads prod tables transparently until first write, so schema mutations need a branch-local copy first.", + "Branch is mandatory (the pull is one-way default -> branch); without it the service raises ConfigError before any HTTP call.", + "Returns a completed storage job dict; the client polls the async job to completion before returning.", + ], + ) +) + # ── storage files ──────��──────────────────────────────────���──────── HintRegistry.register( diff --git a/src/keboola_agent_cli/permissions.py b/src/keboola_agent_cli/permissions.py index f4baa71f..530ca6c1 100644 --- a/src/keboola_agent_cli/permissions.py +++ b/src/keboola_agent_cli/permissions.py @@ -184,6 +184,9 @@ "storage.create-bucket": "write", "storage.create-table": "write", "storage.upload-table": "write", + # clone-table pulls a prod table into a dev branch (materialization); it + # creates a branch-local copy and never deletes -- write, not destructive. + "storage.clone-table": "write", # Storage files "storage.files": "read", "storage.file-detail": "read", diff --git a/src/keboola_agent_cli/server/routers/storage.py b/src/keboola_agent_cli/server/routers/storage.py index 256349b3..ef18de87 100644 --- a/src/keboola_agent_cli/server/routers/storage.py +++ b/src/keboola_agent_cli/server/routers/storage.py @@ -68,6 +68,10 @@ class SwapTables(BaseModel): branch_id: int +class CloneTable(BaseModel): + branch_id: int + + @router.get("/buckets", summary="List storage buckets") def list_buckets( project: str | None = None, @@ -342,6 +346,29 @@ def swap_tables( ) +@router.post( + "/tables/{project}/{table_id:path}/pull", + summary="Clone a table into a dev branch", +) +def clone_table( + project: str, + table_id: str, + body: CloneTable, + dry_run: bool = False, + registry: ServiceRegistry = Depends(get_registry), +) -> dict[str, Any]: + """Pull (clone) a production table into a dev branch. + + Mirrors `kbagent storage clone-table`. + """ + return registry.storage.clone_table( + alias=project, + table_id=table_id, + branch_id=body.branch_id, + dry_run=dry_run, + ) + + @router.post("/tables/{project}/{table_id:path}/describe", summary="Set table description") def describe_table( project: str, diff --git a/src/keboola_agent_cli/services/storage_service.py b/src/keboola_agent_cli/services/storage_service.py index d06bdb7c..eb012ac9 100644 --- a/src/keboola_agent_cli/services/storage_service.py +++ b/src/keboola_agent_cli/services/storage_service.py @@ -1416,6 +1416,72 @@ def swap_tables( "response": response, } + def clone_table( + self, + alias: str, + table_id: str, + branch_id: int | None, + dry_run: bool = False, + ) -> dict[str, Any]: + """Pull (clone) a production table into a dev branch (branch required). + + On ``storage-branches`` projects a dev branch reads production tables + transparently until the first write, so mutating a table's schema in + the branch (e.g. ``swap_tables`` or a column drop) first needs a + branch-local copy of the production table. This materializes that copy + from the default branch. The pull is one-way (default -> branch); the + service raises ConfigError before any HTTP call when ``branch_id`` is + None. + + Args: + alias: Project alias. + table_id: Full ID of the table to pull into the branch. + branch_id: Target dev branch ID (must not be None). + dry_run: If True, only report what would be pulled. + + Returns: + Dict with 'project_alias', 'branch_id', 'table_id', 'dry_run', + and (when not dry-run) 'response'. + + Raises: + ConfigError: If branch_id is None. + KeboolaApiError: If the API call fails. + """ + if branch_id is None: + raise ConfigError( + "clone-table requires a dev branch. Set one with " + "'kbagent branch use --project

--branch ' or pass " + "--branch directly. The pull is one-way: default -> branch." + ) + + projects = self.resolve_projects([alias]) + project = projects[alias] + + if dry_run: + return { + "project_alias": alias, + "branch_id": branch_id, + "table_id": table_id, + "dry_run": True, + } + + client = self._client_factory(project.stack_url, project.token) + try: + response = client.pull_table( + table_id=table_id, + branch_id=branch_id, + ) + finally: + client.close() + + return { + "project_alias": alias, + "branch_id": branch_id, + "table_id": table_id, + "dry_run": False, + "response": response, + } + def delete_buckets( self, alias: str, diff --git a/tests/test_e2e.py b/tests/test_e2e.py index 4d67737d..fe995b23 100644 --- a/tests/test_e2e.py +++ b/tests/test_e2e.py @@ -7287,6 +7287,167 @@ def test_swap_without_branch_is_rejected(self) -> None: assert "dev branch" in payload["error"]["message"] +# --------------------------------------------------------------------------- +# TestE2EStorageCloneTable -- storage clone-table (pull) into a dev branch +# --------------------------------------------------------------------------- + + +@skip_without_credentials +@pytest.mark.e2e +class TestE2EStorageCloneTable: + """End-to-end coverage for ``kbagent storage clone-table``. + + Verifies: + - a production table can be pulled (cloned) into a dev branch and is + then visible/materialized in that branch, + - dry-run skips the HTTP call, + - calls without a branch are rejected before any HTTP traffic. + + On storage-branches projects this materializes the prod table into the + branch (the prerequisite for in-branch swap / column drops). On + legacy-branch projects the pull still succeeds; the assertion only checks + the table is visible in the branch afterwards, which holds for both. + """ + + @pytest.fixture(autouse=True) + def setup(self, tmp_path: Path) -> Generator[None, None, None]: + self.token = os.environ[ENV_TOKEN] + raw_url = os.environ.get(ENV_URL, "connection.keboola.com") + self.url = raw_url if raw_url.startswith("https://") else f"https://{raw_url}" + self.alias = f"{RUN_ID}-clone" + self.config_dir = tmp_path / "config" + self.config_dir.mkdir() + self.client = KeboolaClient(stack_url=self.url, token=self.token) + + self._created_branch_ids: list[int] = [] + self._created_buckets: list[str] = [] + + result = _invoke( + self.config_dir, + [ + "--json", + "project", + "add", + "--project", + self.alias, + "--url", + self.url, + "--token", + self.token, + ], + ) + assert result.exit_code == 0, f"project add failed: {result.output}" + + yield + + # Teardown: branches first (cascades to their materialized tables), + # then the production bucket we created outside of a branch. + for branch_id in self._created_branch_ids: + with contextlib.suppress(Exception): + self.client.delete_dev_branch(branch_id) + for bucket_id in self._created_buckets: + with contextlib.suppress(Exception): + self.client.delete_bucket(bucket_id, force=True) + self.client.close() + + def _run_ok(self, *args: str) -> dict[str, Any]: + return _json_ok(_invoke(self.config_dir, ["--json", *args])) + + def test_clone_prod_table_into_dev_branch(self) -> None: + """Live pull: a production table becomes available in the dev branch.""" + bucket_id = f"in.c-{RUN_ID.replace('-', '_')}_clone" + table_id = f"{bucket_id}.source" + + _step(1, "create a production table (default branch)") + self._run_ok( + "storage", + "create-table", + "--project", + self.alias, + "--bucket-id", + bucket_id, + "--name", + "source", + "--column", + "id:VARCHAR(40)", + "--column", + "value:VARCHAR(20)", + "--primary-key", + "id", + ) + self._created_buckets.append(bucket_id) + + _step(2, "branch create", "target dev branch for the pull") + branch = self._run_ok( + "branch", "create", "--project", self.alias, "--name", f"{RUN_ID}-clone-branch" + )["data"] + branch_id = int(branch["branch_id"]) + self._created_branch_ids.append(branch_id) + + _step(3, "storage clone-table", "POST /tables/.../pull (default -> branch)") + result = self._run_ok( + "storage", + "clone-table", + "--project", + self.alias, + "--table-id", + table_id, + "--branch", + str(branch_id), + )["data"] + assert result["table_id"] == table_id + assert result["branch_id"] == branch_id + assert result["dry_run"] is False + assert result["response"]["status"] == "success" + + _step(4, "table-detail in branch", "table is materialized/visible after pull") + detail = self.client.get_table_detail(table_id, branch_id=branch_id) + col_names = {c["name"] for c in detail["definition"]["columns"]} + assert col_names == {"id", "value"} + + def test_clone_dry_run_does_not_call_api(self) -> None: + """Dry-run skips the HTTP call: exit 0, no response key.""" + _step(1, "branch create", "dry-run still requires a branch context") + branch = self._run_ok( + "branch", "create", "--project", self.alias, "--name", f"{RUN_ID}-clone-dry" + )["data"] + branch_id = int(branch["branch_id"]) + self._created_branch_ids.append(branch_id) + + result = self._run_ok( + "storage", + "clone-table", + "--project", + self.alias, + "--table-id", + "in.c-foo.bar", + "--branch", + str(branch_id), + "--dry-run", + )["data"] + assert result["dry_run"] is True + assert "response" not in result + + def test_clone_without_branch_is_rejected(self) -> None: + """Without active branch and without --branch, exit 5 before any HTTP.""" + result = _invoke( + self.config_dir, + [ + "--json", + "storage", + "clone-table", + "--project", + self.alias, + "--table-id", + "in.c-foo.bar", + ], + ) + assert result.exit_code == 5, result.output + payload = json.loads(result.output) + assert payload["status"] == "error" + assert "dev branch" in payload["error"]["message"] + + # --------------------------------------------------------------------------- # TestE2EDataAppLifecycle -- data-app create / detail / deploy / start / stop / delete # --------------------------------------------------------------------------- diff --git a/tests/test_storage_clone.py b/tests/test_storage_clone.py new file mode 100644 index 00000000..6eb90b0d --- /dev/null +++ b/tests/test_storage_clone.py @@ -0,0 +1,394 @@ +"""Tests for storage clone-table (pull endpoint): client, service, and CLI.""" + +import json +from pathlib import Path +from unittest.mock import MagicMock, patch + +import pytest +from typer.testing import CliRunner + +from keboola_agent_cli.cli import app +from keboola_agent_cli.client import KeboolaClient +from keboola_agent_cli.config_store import ConfigStore +from keboola_agent_cli.errors import ConfigError, KeboolaApiError +from keboola_agent_cli.models import AppConfig, ProjectConfig +from keboola_agent_cli.services.storage_service import StorageService + +runner = CliRunner() + +TEST_TOKEN = "901-55555-fakeTestTokenDoNotUseXXXXXXXX" + + +def _make_store(tmp_path: Path) -> ConfigStore: + config_dir = tmp_path / "config" + config_dir.mkdir(exist_ok=True) + store = ConfigStore(config_dir=config_dir) + config = AppConfig( + projects={ + "test": ProjectConfig( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + ) + }, + ) + store.save(config) + return store + + +def _make_service(store: ConfigStore, mock_client: MagicMock) -> StorageService: + return StorageService( + config_store=store, + client_factory=lambda url, token: mock_client, + ) + + +# --------------------------------------------------------------------------- +# Client layer +# --------------------------------------------------------------------------- + + +class TestPullTableClient: + """Tests for KeboolaClient.pull_table() - HTTP layer.""" + + def test_correct_url_and_no_body(self, httpx_mock) -> None: + """POSTs to /v2/storage/branch/{branch}/tables/{tid}/pull with no body. + + The Storage API responds with a queued storage job + (operationName=devBranchTablePull) which the client polls to + completion. Returning ``status: success`` from the first response + avoids exercising the poll loop in this unit test. + """ + httpx_mock.add_response( + url="https://connection.keboola.com/v2/storage/branch/9999/tables/in.c-foo.data/pull", + method="POST", + json={ + "id": 388266099, + "status": "success", + "operationName": "devBranchTablePull", + "operationParams": {"branchId": 9999}, + }, + status_code=200, + ) + + client = KeboolaClient( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + ) + result = client.pull_table(table_id="in.c-foo.data", branch_id=9999) + + # Returned dict is the completed storage job + assert result["status"] == "success" + assert result["operationName"] == "devBranchTablePull" + + # The pull endpoint takes no request body (verified live against the API) + sent_request = httpx_mock.get_request() + assert sent_request.content == b"" + client.close() + + def test_url_encoding_for_special_characters(self, httpx_mock) -> None: + """Table IDs with dots/dashes are URL-encoded in the path.""" + httpx_mock.add_response( + url="https://connection.keboola.com/v2/storage/branch/1/tables/in.c-bucket-with-dashes.tbl/pull", + method="POST", + json={"id": 1, "status": "success", "operationName": "devBranchTablePull"}, + status_code=200, + ) + + client = KeboolaClient( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + ) + client.pull_table(table_id="in.c-bucket-with-dashes.tbl", branch_id=1) + client.close() + + def test_polls_async_job_to_completion(self, httpx_mock) -> None: + """If POST returns ``status: waiting``, client polls /v2/storage/jobs/{id}.""" + httpx_mock.add_response( + url="https://connection.keboola.com/v2/storage/branch/42/tables/in.c-foo.a/pull", + method="POST", + json={"id": 555, "status": "waiting", "operationName": "devBranchTablePull"}, + status_code=200, + ) + httpx_mock.add_response( + url="https://connection.keboola.com/v2/storage/jobs/555", + method="GET", + json={"id": 555, "status": "success", "operationName": "devBranchTablePull"}, + status_code=200, + ) + + client = KeboolaClient( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + ) + result = client.pull_table(table_id="in.c-foo.a", branch_id=42) + assert result["status"] == "success" + client.close() + + def test_api_error_propagates(self, httpx_mock) -> None: + """Storage API 4xx propagates as KeboolaApiError.""" + httpx_mock.add_response( + url="https://connection.keboola.com/v2/storage/branch/9999/tables/in.c-foo.x/pull", + method="POST", + json={"error": "Table not found in the default branch"}, + status_code=404, + ) + + client = KeboolaClient( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + ) + with pytest.raises(KeboolaApiError): + client.pull_table(table_id="in.c-foo.x", branch_id=9999) + client.close() + + +# --------------------------------------------------------------------------- +# Service layer +# --------------------------------------------------------------------------- + + +class TestCloneTableService: + """Tests for StorageService.clone_table().""" + + def test_success(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + mock_client = MagicMock() + mock_client.pull_table.return_value = {"status": "ok"} + service = _make_service(store, mock_client) + + result = service.clone_table( + alias="test", + table_id="in.c-foo.data", + branch_id=9999, + ) + + assert result["project_alias"] == "test" + assert result["branch_id"] == 9999 + assert result["table_id"] == "in.c-foo.data" + assert result["dry_run"] is False + assert result["response"] == {"status": "ok"} + mock_client.pull_table.assert_called_once_with( + table_id="in.c-foo.data", + branch_id=9999, + ) + mock_client.close.assert_called_once() + + def test_dry_run_skips_client_call(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + mock_client = MagicMock() + service = _make_service(store, mock_client) + + result = service.clone_table( + alias="test", + table_id="in.c-foo.a", + branch_id=42, + dry_run=True, + ) + + assert result["dry_run"] is True + assert "response" not in result + mock_client.pull_table.assert_not_called() + + def test_no_branch_raises_config_error(self, tmp_path: Path) -> None: + """Mandatory branch enforcement: pull is one-way default -> branch.""" + store = _make_store(tmp_path) + mock_client = MagicMock() + service = _make_service(store, mock_client) + + with pytest.raises(ConfigError, match="dev branch"): + service.clone_table( + alias="test", + table_id="in.c-foo.a", + branch_id=None, + ) + mock_client.pull_table.assert_not_called() + + def test_unknown_project(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + mock_client = MagicMock() + service = _make_service(store, mock_client) + + with pytest.raises(ConfigError): + service.clone_table( + alias="nonexistent", + table_id="in.c-foo.a", + branch_id=42, + ) + + def test_api_error_propagates(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + mock_client = MagicMock() + mock_client.pull_table.side_effect = KeboolaApiError( + "Table not found", status_code=404, error_code="NOT_FOUND" + ) + service = _make_service(store, mock_client) + + with pytest.raises(KeboolaApiError): + service.clone_table( + alias="test", + table_id="in.c-foo.a", + branch_id=42, + ) + # Service must close the client even when the API call raises + # (try/finally contract -- regression guard for the lifecycle). + mock_client.close.assert_called_once() + + +# --------------------------------------------------------------------------- +# CLI layer +# --------------------------------------------------------------------------- + + +class TestCloneTableCLI: + """CLI tests for `kbagent storage clone-table`.""" + + def _project_with_active_branch(self, store: ConfigStore, branch_id: int) -> None: + config = store.load() + config.projects["test"].active_branch_id = branch_id + store.save(config) + + def test_clone_json(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + self._project_with_active_branch(store, 9999) + + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.StorageService") as MockSvc, + ): + MockStore.return_value = store + svc = MockSvc.return_value + svc.clone_table.return_value = { + "project_alias": "test", + "branch_id": 9999, + "table_id": "in.c-foo.data", + "dry_run": False, + "response": {"status": "ok"}, + } + result = runner.invoke( + app, + [ + "--json", + "storage", + "clone-table", + "--project", + "test", + "--table-id", + "in.c-foo.data", + ], + ) + + assert result.exit_code == 0, result.output + data = json.loads(result.output)["data"] + assert data["table_id"] == "in.c-foo.data" + assert data["branch_id"] == 9999 + + svc.clone_table.assert_called_once_with( + alias="test", + table_id="in.c-foo.data", + branch_id=9999, + dry_run=False, + ) + + def test_clone_dry_run(self, tmp_path: Path) -> None: + store = _make_store(tmp_path) + self._project_with_active_branch(store, 42) + + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.StorageService") as MockSvc, + ): + MockStore.return_value = store + svc = MockSvc.return_value + svc.clone_table.return_value = { + "project_alias": "test", + "branch_id": 42, + "table_id": "in.c-foo.a", + "dry_run": True, + } + result = runner.invoke( + app, + [ + "--json", + "storage", + "clone-table", + "--project", + "test", + "--table-id", + "in.c-foo.a", + "--dry-run", + ], + ) + + assert result.exit_code == 0, result.output + data = json.loads(result.output)["data"] + assert data["dry_run"] is True + svc.clone_table.assert_called_once() + call_kwargs = svc.clone_table.call_args.kwargs + assert call_kwargs["dry_run"] is True + + def test_clone_explicit_branch_overrides_active(self, tmp_path: Path) -> None: + """--branch flag takes precedence over project's active_branch_id.""" + store = _make_store(tmp_path) + self._project_with_active_branch(store, 100) + + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.StorageService") as MockSvc, + ): + MockStore.return_value = store + svc = MockSvc.return_value + svc.clone_table.return_value = { + "project_alias": "test", + "branch_id": 555, + "table_id": "in.c-foo.a", + "dry_run": False, + "response": {"status": "ok"}, + } + result = runner.invoke( + app, + [ + "--json", + "storage", + "clone-table", + "--project", + "test", + "--table-id", + "in.c-foo.a", + "--branch", + "555", + ], + ) + + assert result.exit_code == 0, result.output + call_kwargs = svc.clone_table.call_args.kwargs + assert call_kwargs["branch_id"] == 555 + + def test_clone_missing_branch_fails_clearly(self, tmp_path: Path) -> None: + """Without an active branch and without --branch, ConfigError -> exit 5.""" + store = _make_store(tmp_path) + # No active_branch_id set on project + + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.StorageService") as MockSvc, + ): + MockStore.return_value = store + svc = MockSvc.return_value + svc.clone_table.side_effect = ConfigError("clone-table requires a dev branch.") + result = runner.invoke( + app, + [ + "--json", + "storage", + "clone-table", + "--project", + "test", + "--table-id", + "in.c-foo.a", + ], + ) + + assert result.exit_code == 5 + payload = json.loads(result.output) + assert payload["status"] == "error" + assert "dev branch" in payload["error"]["message"] diff --git a/uv.lock b/uv.lock index 3bff4baf..62b1f6a0 100644 --- a/uv.lock +++ b/uv.lock @@ -496,7 +496,7 @@ wheels = [ [[package]] name = "keboola-agent-cli" -version = "0.51.1" +version = "0.52.0" source = { editable = "." } dependencies = [ { name = "croniter" }, From 94c01b51d3862b7000e63d180300d8f4cee47d5f Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 21:32:41 +0200 Subject: [PATCH 2/6] docs(typify): rewrite for dev-branch-rehearsal + prod-swap; fix false "rejects on production" (#362) Dev-branch merge propagates only configurations, NOT storage table schema (confirmed by the storage-branches design + Keboola public docs, and reproduced live). Two things were documented wrong: 1. typify-table-workflow.md claimed merge promotes the swapped/typed schema to production. It does not. Reworked into a two-stage model: rehearse in a dev branch (profile, build, swap, validate downstream), then repeat the real build + swap in the production (default) branch. Removed the bogus Phase 8 "merge promotes to prod"; added the prod execution with its inconsistency-window + rollback cautions. 2. swap-tables docstrings / command help / hint / context / gotchas / storage-types-workflow all claimed "the Storage API rejects this on production". It does not -- a default-branch swap is verified to work (project 10539) and is the supported way to retype a prod table. Corrected the wording across all surfaces. No code-behavior change: branch_id is still mandatory and the swap is still branch-scoped -- only the documentation/docstrings were wrong. Added a 0.52.0 changelog entry for the correction (the historical 0.28.0 entry is left as-is). Completes the A+B half of keboola/cli#362. --- .../skills/kbagent/references/gotchas.md | 40 +++-- .../references/storage-types-workflow.md | 24 ++- .../references/typify-table-workflow.md | 149 +++++++++++------- src/keboola_agent_cli/changelog.py | 1 + src/keboola_agent_cli/client.py | 8 +- src/keboola_agent_cli/commands/context.py | 6 +- src/keboola_agent_cli/commands/storage.py | 17 +- .../hints/definitions/storage.py | 2 +- .../services/storage_service.py | 10 +- 9 files changed, 162 insertions(+), 95 deletions(-) diff --git a/plugins/kbagent/skills/kbagent/references/gotchas.md b/plugins/kbagent/skills/kbagent/references/gotchas.md index eda47be9..105c934d 100644 --- a/plugins/kbagent/skills/kbagent/references/gotchas.md +++ b/plugins/kbagent/skills/kbagent/references/gotchas.md @@ -766,23 +766,28 @@ events and emits a final `done` SSE frame mirroring the same record. latest -- previously only the latest was reported, leaving the user with no signal whether their cache was stale. -## `storage swap-tables` is dev-branch only and aliases stay put (since v0.28.0) +## `storage swap-tables` is branch-scoped and aliases stay put (since v0.28.0) - `kbagent storage swap-tables --project P --table-id A --target-table-id B - --branch ` swaps two tables' physical positions in a dev branch + --branch ` swaps two tables' physical positions (`POST /v2/storage/branch/{branch}/tables/{id}/swap`). -- The Storage API rejects this on production. The service refuses with - exit 5 / `ConfigError` *before* any HTTP call when neither `--branch` - nor an active branch (via `branch use`) is set. +- **branch_id is mandatory, but any branch works -- including the + default/production branch.** The service refuses with exit 5 / + `ConfigError` *before* any HTTP call only when neither `--branch` nor an + active branch (via `branch use`) is set. (The earlier "rejected on + production" claim was wrong -- verified live 2026-06-01: a default-branch + swap succeeds and is the supported way to retype a prod table.) - **Aliases are NOT transferred.** They keep pointing at the same physical position, so after the swap they expose the OTHER table's data. Plan downstream config rewrites if any aliased consumer relies on schema, not data. -- Typical use: AI agent profiles a typeless table, builds a typed - rebuild called `_change_log` via CTAS in a dev branch, then - swaps it back into the original name. After merging the branch the - original table now carries the typed schema with no downstream config - rewrite required. +- **Dev-branch merge does NOT carry storage schema** (only configs), so a + swap done inside a dev branch never reaches production via merge. The + dev branch is a *rehearsal* -- profile the typeless table, build a typed + rebuild (`_change_log`) via CTAS, swap, and run downstream configs + against it to prove the typed schema is consumer-safe. Then discard the + branch and run the real build + swap in the production (default) branch. + Full procedure: `typify-table-workflow.md`. ## `storage clone-table` materializes a prod table into a dev branch (since v0.52.0) @@ -808,6 +813,21 @@ events and emits a final `done` SSE frame mirroring the same record. `branch use`) is set. - Permission class: `write` (creates a branch-local copy; never deletes). +## Dev-branch merge carries only configurations, NOT storage schema (verified 2026-06-01) + +- When a dev branch is merged to production, Keboola propagates + **configuration** changes only. Physical storage tables -- their + schema, column types, and rows -- are **not** merged back. (Confirmed + by the storage-branches designer and Keboola's public docs: + help.keboola.com/tutorial/branches/merge-to-production.) +- Consequence for retyping: a `swap-tables` (or `clone-table`) done inside + a dev branch stays in the branch. To retype a **production** table you + run the build + `swap-tables` in the production (default) branch itself. + The dev branch is only a rehearsal that validates the typed schema + against downstream configs. Full procedure: `typify-table-workflow.md`. +- The only API path between the two table stores is `clone-table` (pull, + default -> branch). There is no "push branch -> default". + ## `storage truncate-table` preserves schema; endpoint is uniformly async-via-job (since v0.32.0) - `kbagent storage truncate-table --project P --table-id T [--branch ID] diff --git a/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md b/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md index 2fb0355b..d4d89495 100644 --- a/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md +++ b/plugins/kbagent/skills/kbagent/references/storage-types-workflow.md @@ -240,7 +240,12 @@ needs to flip the typed copy back into the original name so downstream configs (extractors, transformations, writers) keep working unchanged. ```bash -# 1. Isolate the work in a dev branch +# 1. Rehearse in a dev branch (validate the typed schema against downstream +# configs). The REAL retype is then repeated in the default/production +# branch -- dev-branch merge does NOT carry storage schema. For the full +# rehearsal-then-production procedure, see typify-table-workflow.md. +# below is the rehearsal branch; for the production run pass the +# default-branch ID instead. kbagent branch create --project prod --name typify-data kbagent branch use --project prod --branch @@ -274,8 +279,11 @@ kbagent storage swap-tables \ --target-table-id in.c-foo.data_change_log \ --branch --yes -# 4. Merge the dev branch when satisfied -kbagent branch merge --project prod --branch +# 4. There is NO merge step: dev-branch merge carries only configs, not +# storage schema. Once the rehearsal proves the schema is safe, delete +# the branch and repeat steps 2-3 in the default/production branch +# (pass the default-branch ID to --branch on swap-tables). +kbagent branch delete --project prod --branch --yes ``` Rules: @@ -285,12 +293,14 @@ Rules: found" until you `clone-table` it into the branch (step 2b). The typed sibling built by the in-branch CTAS is already branch-local. Legacy fake-branch projects don't need this. -- The Storage API rejects this on production. The service refuses with - exit 5 (`ConfigError`) before any HTTP if `--branch` is missing AND no - active branch is set via `branch use`. +- branch_id is mandatory: the service refuses with exit 5 (`ConfigError`) + before any HTTP if `--branch` is missing AND no active branch is set via + `branch use`. Any branch works, INCLUDING the default/production branch + -- a default-branch swap is how the retype reaches prod (the earlier + "rejected on production" claim was wrong). - Aliases keep pointing at the same physical position, i.e. they expose the OTHER table's data after the swap. If your downstream relies on - alias-by-name, validate post-swap before merging. + alias-by-name, validate post-swap before applying in production. - The Storage API queues the swap as an async storage job (`operationName: tableSwap`); the kbagent client polls the job to completion before returning, so callers can rely on the schemas being diff --git a/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md b/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md index 91364d56..a8505b85 100644 --- a/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md +++ b/plugins/kbagent/skills/kbagent/references/typify-table-workflow.md @@ -14,12 +14,23 @@ canonical move is: (downstream sees real types) (now holds the old typeless data) ``` -The whole thing happens inside a **dev branch** so production transformations -keep running on the typeless original until the user merges. Aliases stay -put across the swap (they expose the OTHER table's data after -- see -`gotchas.md` "swap-tables aliases stay put"). - -Since v0.28.0 (`storage swap-tables` + `config update` script[] auto-normalize). +**Two-stage model -- rehearse in a dev branch, apply in production.** +Dev-branch merge propagates only *configurations*, NOT storage table +schema (see `gotchas.md` "Dev-branch merge carries only configurations"), +so a swap done inside a dev branch never reaches production via merge. The +dev branch is therefore a **rehearsal**: profile the data, build the typed +sibling, swap, and run downstream configs against it to *prove the typed +schema is consumer-safe*. Once proven, **discard the branch** and run the +real build + swap directly in the production (default) branch -- a +default-branch swap is supported (verified live) and is the only path that +actually retypes the production table. Aliases stay put across the swap +(they expose the OTHER table's data after -- see `gotchas.md` +"swap-tables aliases stay put"). + +Since v0.28.0 (`storage swap-tables` + `config update` script[] +auto-normalize); `storage clone-table` (v0.52.0) materializes a prod table +into a branch when the rehearsal needs the original branch-local on +storage-branches projects. ## Phase 0 -- Decide if you should do this @@ -59,10 +70,13 @@ kbagent --json project status --project ALIAS # -> branch field shows the new branch_id ``` -Why a dev branch: +Why a dev branch (this is a **rehearsal**, not the thing that ships): +- You use the branch to prove the typed schema is downstream-safe. The + production retype (Phase 8) repeats the build + swap in the default + branch -- merge does NOT carry the swapped schema to prod, only configs. - Production transformations and writers keep targeting the typeless - original; the rebuild is invisible to them until merge. + original; the rehearsal is invisible to them. - All the writes below (`storage create-table`, `workspace query` CTAS, `swap-tables`) are scoped to the branch by `branch use`'s active-branch resolution. @@ -265,7 +279,10 @@ For BigQuery dialect callers, also validate `bigquery_path` consumers (see `storage-describe-workflow.md`'s `bucket-detail` section -- BQ emits backtick-quoted `\`dataset\`.\`table\`` paths since v0.25.3). -## Phase 5 -- Swap +## Phase 5 -- Swap (in the rehearsal branch) + +This swap happens in the dev branch to prove the typed schema works; the +production swap is repeated in Phase 8. ```bash # 5.0. storage-branches projects ONLY: the swap is a write, and the dev @@ -314,7 +331,7 @@ After the swap: - Aliases pointing at either table keep pointing at the same physical position, so they expose the OTHER table's data. If any downstream config refers to an alias, run a manual sanity check on it before - merge. + applying the retype in production (Phase 8). ## Phase 6 -- Smoke-test downstream @@ -347,75 +364,85 @@ verify row counts, not just job exit status. Ideally diff `data_typed` (= the old typeless rows) against the swapped `data` on a key column to confirm row-level identity. -## Phase 7 -- Cleanup `data_typed` (optional) +## Phase 7 -- Tear down the rehearsal branch -After a successful smoke test, the `data_typed` sibling holds the -old typeless rows and can be deleted. **Do this only after the user -confirms the merge.** Until merge, the sibling is the primary rollback -artifact (re-swap to undo). +Once Phases 4-6 prove the typed schema is consumer-safe, the dev branch +has done its job. **Nothing in it ships** -- merge will not carry the +swapped schema to production (only configs merge). Delete the branch +(this also drops the branch-local `data_typed` sibling): ```bash -# After merge (Phase 8 below), in main: -kbagent storage delete-table \ - --project ALIAS \ - --table-id in.c-foo.data_typed \ - --yes +kbagent branch delete --project ALIAS --branch --yes ``` -## Phase 8 -- Handoff protocol for the user (merge step) +Keep a written record of what the rehearsal proved -- the Phase 2 profile +summary and the Phase 4/6 downstream job results -- because Phase 8 +repeats the build in production with the same type decisions. -The Keboola Storage API does not merge dev branches via API -- the -merge is a human action in the UI. kbagent's `branch merge` command -returns a URL pointing the user to the right place. +## Phase 8 -- Apply the retype in production -Hand the user a structured summary so they can review before clicking -merge. Recommended shape: +Because dev-branch merge does not carry storage schema (see `gotchas.md` +"Dev-branch merge carries only configurations"), the real retype runs in +the **production (default) branch**, repeating the validated build: -```text -TYPIFY READY FOR MERGE -- in.c-foo.data +```bash +# 8a. Resolve the default (production) branch ID. +kbagent --json branch list --project ALIAS +# -> the entry with isDefault=true; call it . + +# 8b. Build the typed sibling in PRODUCTION using the exact types the +# rehearsal validated (Phase 2/3): same create-table + data copy as +# Phase 3, but targeting the default branch. +kbagent --json storage create-table --project ALIAS \ + --bucket-id in.c-foo --name data_typed \ + --column id:VARCHAR(40) --column amount:"NUMBER(18,2)" --branch +# ...then copy rows in (in-workspace INSERT or an SQL transformation, +# exactly as in Phase 3 Option A / B). + +# 8c. Swap in production. A default-branch swap is supported. +kbagent --json storage swap-tables --project ALIAS \ + --table-id in.c-foo.data \ + --target-table-id in.c-foo.data_typed \ + --branch --yes +# -> in.c-foo.data now carries the typed schema in production. -Branch: () -Source: in.c-foo.data (was: typeless STRING(16M); now: typed) -Sibling: in.c-foo.data_typed (was: typed empty; now: typeless rows preserved) +# 8d. Smoke-test a downstream config in production, then clean up. +kbagent storage delete-table --project ALIAS --table-id in.c-foo.data_typed --yes +``` + +Two production-only cautions the rehearsal does not surface: + +- **Inconsistency window.** Between 8b (copy) and 8c (swap), upstream + writers may append rows to the live `data`. Either quiesce the upstream + load for the swap window, or run a final incremental catch-up INSERT + right before the swap. The swap itself is atomic and sub-15s on Snowflake. +- **Rollback.** `data_typed` (now holding the old typeless rows) is the + rollback artifact -- re-swap to undo -- until you delete it in 8d. + +Hand the user a structured summary before running 8c. Recommended shape: -Phase 2 profile summary: +```text +TYPIFY READY TO APPLY IN PRODUCTION -- in.c-foo.data (project ALIAS) + +Rehearsal branch proved the schema is downstream-safe: rows: 1,234,567 id: STRING -> VARCHAR(40) (max observed length: 36) - name: STRING -> VARCHAR(256) (max observed length: 247) amount: STRING -> NUMBER(18,2) (max precision: 14, max scale: 2) created_at: STRING -> TIMESTAMP_NTZ (0 parse failures across 1.2M rows) is_paid: STRING -> BOOLEAN (values: 'true' (840k), 'false' (390k)) + downstream config : green pre- and post-swap in the + branch, rows_out unchanged. + +Production plan (default branch ): + 1. create in.c-foo.data_typed with the types above + 2. copy rows (quiesce writers or do a final catch-up INSERT first) + 3. swap-tables in.c-foo.data <-> in.c-foo.data_typed + 4. smoke-test , then delete in.c-foo.data_typed -Phase 4 baseline (pre-swap): - config -- ran in against typeless source - job : status=success, rows_in=1,234,567, rows_out=N - -Phase 5 swap: - storage job : operationName=tableSwap, status=success, took=12s - -Phase 6 smoke (post-swap): - config -- ran in against typed source - job : status=success, rows_in=1,234,567, rows_out=N - rows_out matches pre-swap value: YES - -Validate: - - kbagent storage table-detail --project ALIAS --table-id in.c-foo.data --branch - (column_details should show VARCHAR/NUMBER/TIMESTAMP/BOOLEAN, not STRING) - - Spot-check 5 rows: SELECT * FROM "in.c-foo.data" LIMIT 5 in workspace W_ID - -Merge: -Rollback (pre-merge): kbagent storage swap-tables --project ALIAS \ - --table-id in.c-foo.data \ - --target-table-id in.c-foo.data_typed \ - --branch --yes -After-merge cleanup: kbagent storage delete-table --project ALIAS \ - --table-id in.c-foo.data_typed --yes +Rollback (pre-cleanup): re-run swap-tables to put the typeless table back. ``` -The user reviews, clicks merge, and the typed schema lands in -production. The sibling carrying the typeless rows survives the merge -(branched-storage propagation); cleanup happens in `main` per the -note above. +The rehearsal branch is already gone (Phase 7); there is no merge step. ## Failure modes to anticipate diff --git a/src/keboola_agent_cli/changelog.py b/src/keboola_agent_cli/changelog.py index 3b240406..209af0c8 100644 --- a/src/keboola_agent_cli/changelog.py +++ b/src/keboola_agent_cli/changelog.py @@ -10,6 +10,7 @@ CHANGELOG: dict[str, list[str]] = { "0.52.0": [ 'New: `kbagent storage clone-table --project P --table-id ID --branch ID [--dry-run]` -- pulls (clones) a production table into a development branch via the Storage API `POST /v2/storage/branch/{branch}/tables/{id}/pull` endpoint (operationName `devBranchTablePull`, the same call the platform issues on a branch\'s first write to a prod table). On `storage-branches` projects a dev branch reads production tables transparently (copy-on-write) until the first write, so a schema mutation in the branch -- `swap-tables`, dropping a column -- fails with a misleading "bucket not found" until the table is materialized branch-local. `clone-table` performs that materialization. The pull is one-way (default -> branch); the service refuses with exit 5 (`ConfigError`) before any HTTP call when neither `--branch` nor an active branch (via `kbagent branch use`) is set. The API returns a queued storage job which the client polls to completion before returning, mirroring `swap-tables` semantics. Permission class: `write` (creates a branch-local copy; never deletes). New layers: `KeboolaClient.pull_table`, `StorageService.clone_table`, `commands/storage.py` `clone-table`, hint `storage.clone-table`, and a 1:1 `kbagent serve` REST route (`POST /storage/tables/{project}/{table_id}/pull`). Tests: `tests/test_storage_clone.py` (13: client/service/CLI) + `tests/test_e2e.py::TestE2EStorageCloneTable` (3). Live-validated against project 10539 (storage-branches ON): clone a prod table into a dev branch -> table materialized -> in-branch `swap-tables` then succeeds (it previously failed with "bucket not found") -> production left untouched. Addresses the clone-prod-table-into-branch request in keboola/cli#362.', + 'Docs/correctness: corrected the typify workflow and `swap-tables` guidance after live verification (keboola/cli#362). (1) A dev-branch swap does NOT reach production via merge -- Keboola dev-branch merge propagates only configurations, not storage table schema (confirmed by the storage-branches design + Keboola public docs). `typify-table-workflow.md` is reworked into a two-stage model: rehearse in a dev branch (profile, build, swap, validate downstream), then repeat the real build + swap in the production (default) branch; the prior "merge promotes the typed schema to production" Phase 8 was wrong and is removed. (2) `swap-tables` does NOT "reject on production" -- a swap on the default/production branch is supported (verified live on project 10539) and is the way a typed rebuild is applied to prod. Corrected the swap docstrings (client/service), command help, hint, `context`, `gotchas.md`, and `storage-types-workflow.md`; the historical 0.28.0 changelog entry is left as-is. No code-behavior change: `branch_id` is still mandatory (the swap is branch-scoped); only the documentation was wrong.', ], "0.51.1": [ "Fix (dev-portal): admin-role PATCH routing. `complexity`, `categories`, `forwardToken`, `forwardTokenDetails`, `injectEnvironment`, `processTimeout`, `requiredMemory`, `features`, and `category` are `.forbidden()` on the apps-api vendor schema (`clientAppSchema` in keboola/developer-portal:src/lib/validation.js) but settable on the admin schema. The vendor PATCH returns a misleading 422 (`Parameter complexity must be one of: easy, medium, hard`) because the enum-validation `.error()` annotation is attached on the shared admin schema before `clientAppSchema()` overrides with `.forbidden()`. `DeveloperPortalIdentity.role_hint` becomes a real validator (`vendor`/`admin`, case-folded, typos raise); `DeveloperPortalClient.patch_app` now reads the role and routes admin identities to `PATCH /admin/apps/{app}` (permissive schema); `DeveloperPortalService.prepare_patch` preflights vendor-role + admin-only-field combinations with a fail-fast error that names every offending field, explains why the 422 is misleading, and tells the user the exact command to switch identity. Admin role bypasses the preflight entirely. Reads, create, upload-icon, deprecate keep vendor-endpoint behaviour -- only PATCH has a meaningful admin variant on the server.", diff --git a/src/keboola_agent_cli/client.py b/src/keboola_agent_cli/client.py index 204ffbde..2504310c 100644 --- a/src/keboola_agent_cli/client.py +++ b/src/keboola_agent_cli/client.py @@ -1777,12 +1777,14 @@ def swap_tables( target_table_id: str, branch_id: int, ) -> dict[str, Any]: - """Swap two storage tables (async, waits for completion, dev branch only). + """Swap two storage tables (async, waits for completion; branch-scoped). Both tables exchange physical positions; aliases keep pointing at the same physical position and therefore expose the OTHER table's data - after the swap. The Storage API rejects this on production -- a - ``branch_id`` is mandatory. + after the swap. ``branch_id`` is mandatory (the swap is always scoped + to a branch), but ANY branch works -- including the default/production + branch. A default-branch swap is the supported way to retype a prod + table, because dev-branch merge does not propagate storage schema. The API returns a queued storage job (``operationName: tableSwap``) which this method polls to completion before returning, mirroring diff --git a/src/keboola_agent_cli/commands/context.py b/src/keboola_agent_cli/commands/context.py index 057e0c11..16df9e23 100644 --- a/src/keboola_agent_cli/commands/context.py +++ b/src/keboola_agent_cli/commands/context.py @@ -398,8 +398,10 @@ Swap two storage tables in a dev branch (POST /tables/{id}/swap). Both tables exchange physical positions; aliases are NOT transferred (they keep pointing at the same physical position and therefore expose the OTHER table's data after the swap). Use to promote a typed rebuild back into the original name without - touching downstream config references. Storage API rejects this on production: --branch (or active branch - via 'kbagent branch use') is mandatory. Service guards before any HTTP call when no branch is set. + touching downstream config references. branch_id is mandatory (--branch or active branch via 'kbagent + branch use'); service guards before any HTTP call when none is set. Any branch works, INCLUDING the + default/production branch -- a default-branch swap is how a typed rebuild reaches prod (dev-branch merge + does not carry storage schema). kbagent storage clone-table --project NAME --table-id ID --branch ID [--dry-run] Clone (pull) a production table into a dev branch (POST /tables/{id}/pull). On storage-branches projects a diff --git a/src/keboola_agent_cli/commands/storage.py b/src/keboola_agent_cli/commands/storage.py index b67525f5..2c8cd3a3 100644 --- a/src/keboola_agent_cli/commands/storage.py +++ b/src/keboola_agent_cli/commands/storage.py @@ -1308,9 +1308,10 @@ def storage_swap_tables( None, "--branch", help=( - "Dev branch ID. Required by the Storage API; defaults to the " - "active branch set via 'kbagent branch use'. Production swaps " - "are rejected by the API." + "Branch ID. Required; defaults to the active branch set via " + "'kbagent branch use'. Any branch works, including the " + "default/production branch -- a default-branch swap is how a " + "typed rebuild is applied to production." ), ), dry_run: bool = typer.Option( @@ -1334,10 +1335,12 @@ def storage_swap_tables( name ("data") without touching downstream config references. \b - The Storage API restricts this to dev branches. The command resolves - the active branch from 'kbagent branch use' if --branch is omitted; - if no branch is set in either place, the call is rejected before any - HTTP call. + branch_id is mandatory (the swap is always branch-scoped): the command + resolves the active branch from 'kbagent branch use' if --branch is + omitted, and exits 5 before any HTTP call if no branch is set in either + place. Any branch works, INCLUDING the default/production branch -- a + default-branch swap is how a typed rebuild is applied to prod, since a + dev-branch merge does not carry storage schema. \b Example: diff --git a/src/keboola_agent_cli/hints/definitions/storage.py b/src/keboola_agent_cli/hints/definitions/storage.py index dba5a368..4e265f2e 100644 --- a/src/keboola_agent_cli/hints/definitions/storage.py +++ b/src/keboola_agent_cli/hints/definitions/storage.py @@ -493,7 +493,7 @@ ), ], notes=[ - "Storage API rejects swaps on production: branch_id is mandatory.", + "branch_id is mandatory (the swap is branch-scoped); any branch works, including the default/production branch -- a default-branch swap retypes a prod table (dev-branch merge does not carry storage schema).", "Returns a completed storage job dict (operationName=tableSwap); the client polls the async job to completion before returning.", "Aliases keep pointing at the same physical position, exposing the OTHER table's data after the swap.", ], diff --git a/src/keboola_agent_cli/services/storage_service.py b/src/keboola_agent_cli/services/storage_service.py index eb012ac9..a9c91663 100644 --- a/src/keboola_agent_cli/services/storage_service.py +++ b/src/keboola_agent_cli/services/storage_service.py @@ -1344,7 +1344,7 @@ def swap_tables( branch_id: int | None, dry_run: bool = False, ) -> dict[str, Any]: - """Swap two storage tables (dev branch only). + """Swap two storage tables (branch-scoped; branch_id mandatory). After the swap, the two tables exchange physical positions. Aliases are NOT transferred -- they keep pointing at the same physical @@ -1352,9 +1352,11 @@ def swap_tables( This is the documented behavior of the Storage API; the service layer does not try to rewrite alias targets. - The Storage API rejects this operation on production -- a dev branch - ID is mandatory. The service raises ConfigError before any HTTP call - when ``branch_id`` is None. + ``branch_id`` is mandatory and the service raises ConfigError before + any HTTP call when it is None. Any branch is accepted, INCLUDING the + default/production branch -- a default-branch swap is the supported + way to retype a production table (dev-branch merge does not propagate + storage schema, so a swap done in a dev branch never reaches prod). Args: alias: Project alias. From bc4131afaaf97d5e9455c5f709fe44d5e1528747 Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 22:06:15 +0200 Subject: [PATCH 3/6] docs+test: address PR #368 review (NB-1 swap semantics, NB-2 test name) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NB-1: two keboola-expert.md matrix rows still described `swap-tables` as dev-branch-only -- corrected to "any branch (incl. prod)", consistent with the A+B semantics fix elsewhere in this PR. Net +4 bytes; the prompt stays under its 62000-byte budget (the clone-table version-gate line is still omitted for budget, as noted in the review). NB-2: renamed test_url_encoding_for_special_characters -> test_dotted_table_id_passed_verbatim_in_path (clone + swap dvojče). Dots and dashes are RFC 3986 unreserved, so quote(..., safe="") does not percent-encode them; the test verifies verbatim path pass-through, not encoding. Docstring corrected to say so. --- plugins/kbagent/agents/keboola-expert.md | 4 ++-- tests/test_storage_clone.py | 9 +++++++-- tests/test_storage_swap.py | 9 +++++++-- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/plugins/kbagent/agents/keboola-expert.md b/plugins/kbagent/agents/keboola-expert.md index c69f8ee0..6e063e0f 100644 --- a/plugins/kbagent/agents/keboola-expert.md +++ b/plugins/kbagent/agents/keboola-expert.md @@ -147,9 +147,9 @@ a critical failure. | Fetch a specific config | `kbagent config detail --project P --component-id C --config-id K --json` | `tool call get_config` | re-using an earlier JSON dump | | Override the auto-derived output bucket on a config | `kbagent config set-default-bucket --bucket in.c-name` (0.26.0+) -- read-modify-write of `storage.output.default_bucket`, preserves siblings; `--clear` removes it | `kbagent config update --set 'storage.output.default_bucket=in.c-name'` (works pre-0.26.0 but not discoverable) | editing the raw JSON in the UI; full-config replace with `--configuration` (wipes other storage keys) | | Cross-project migration | `kbagent sync pull` + edit files locally + `kbagent sync push --dry-run` | -- | repeated `tool call` loops, one per resource | -| Retype table columns | fetch types via `workspace query`, draft types YAML, write new transformation that produces typed output table, then `kbagent storage swap-tables` (0.28.0+) to flip the typed copy into the original name in a dev branch | `kbagent --hint client create_table_definition` if the future `storage retype` composite (§14.3) is not yet present | `POST /v2/storage/buckets/.../tables-definition` (REST) followed by manual config rewrites | +| Retype table columns | fetch types via `workspace query`, draft types YAML, write new transformation that produces typed output table, then `kbagent storage swap-tables` (0.28.0+) to flip the typed copy into the original name in any branch | `kbagent --hint client create_table_definition` if the future `storage retype` composite (§14.3) is not yet present | `POST /v2/storage/buckets/.../tables-definition` (REST) followed by manual config rewrites | | Create typed table with native types | `kbagent storage create-table --column pk:VARCHAR(40) --column amount:NUMBER(18,2) --not-null pk --default amount=0` (0.25.0+) | `tool call create_table` (accepts the same `definition.length` shape via MCP) | re-creating via raw REST to `/v2/storage/...tables-definition` | -| Promote typed rebuild back into the original name | `kbagent storage swap-tables --project P --table-id in.c-foo.data --target-table-id in.c-foo.data_change_log --branch --yes` (0.28.0+) -- async storage job (`tableSwap`); client polls to completion before returning. Service refuses without a branch | -- | renaming or deleting + re-uploading (loses history; downstream configs need to be rewritten) | +| Promote typed rebuild back into the original name | `kbagent storage swap-tables --project P --table-id in.c-foo.data --target-table-id in.c-foo.data_change_log --branch --yes` (0.28.0+) -- async storage job (`tableSwap`); client polls to completion. Service refuses without a branch; any branch incl. prod | -- | renaming or deleting + re-uploading (loses history; downstream configs need to be rewritten) | | Re-seed a table without losing its schema / PK / dependents | `kbagent storage truncate-table --project P --table-id in.c-foo.data [--branch ID] [--dry-run] [--yes]` (0.32.0+) -- DELETE `/tables/{id}/rows?allowTruncate=1`; endpoint is uniformly async on every branch (returns a queued `tableRowsDelete` job; client polls via `_wait_for_storage_job`). Do NOT pass `async=true` -- the API rejects it. Batch via repeated `--table-id`. Returns `{truncated[], failed[], dry_run, project_alias}` with `truncated[]` entries carrying `{table_id, rows_before, rows_after, branch_id}`. Permission class: `destructive` | `tool call delete_table_rows` if the upstream MCP exposes it | drop + recreate the table (loses descriptions, PK, sharing edges, and breaks every downstream config reference); deleting rows via raw SQL in a workspace (bypasses the Storage API audit trail) | | Debug a failed job | `kbagent job detail --project P --job-id J --json` + `kbagent job run ... --log-tail-lines 200` | `kbagent workspace from-transformation` for SQL repro | "I think the issue is..." without reading logs | | Ad-hoc SQL / row-count / type audit | `kbagent workspace create` + `kbagent workspace load` + `kbagent workspace query --sql "..."` | `kbagent workspace from-transformation` for existing transform debugging; `workspace list --qs-compatible` (0.42.0+, #304) for data-app reuse | querying Keboola Storage directly via Snowflake credentials outside the workspace abstraction | diff --git a/tests/test_storage_clone.py b/tests/test_storage_clone.py index 6eb90b0d..8358566b 100644 --- a/tests/test_storage_clone.py +++ b/tests/test_storage_clone.py @@ -85,8 +85,13 @@ def test_correct_url_and_no_body(self, httpx_mock) -> None: assert sent_request.content == b"" client.close() - def test_url_encoding_for_special_characters(self, httpx_mock) -> None: - """Table IDs with dots/dashes are URL-encoded in the path.""" + def test_dotted_table_id_passed_verbatim_in_path(self, httpx_mock) -> None: + """Dotted/dashed table IDs land in the path as-is. + + Dots and dashes are RFC 3986 unreserved, so ``quote(..., safe="")`` + does not percent-encode them; this verifies the table ID is placed + in the path verbatim (a reserved char, if present, would be encoded). + """ httpx_mock.add_response( url="https://connection.keboola.com/v2/storage/branch/1/tables/in.c-bucket-with-dashes.tbl/pull", method="POST", diff --git a/tests/test_storage_swap.py b/tests/test_storage_swap.py index 0459b0c8..fe926751 100644 --- a/tests/test_storage_swap.py +++ b/tests/test_storage_swap.py @@ -92,8 +92,13 @@ def test_correct_url_and_body(self, httpx_mock) -> None: assert body == {"targetTableId": "in.c-foo.data_change_log"} client.close() - def test_url_encoding_for_special_characters(self, httpx_mock) -> None: - """Table IDs with dots/dashes are URL-encoded in the path.""" + def test_dotted_table_id_passed_verbatim_in_path(self, httpx_mock) -> None: + """Dotted/dashed table IDs land in the path as-is. + + Dots and dashes are RFC 3986 unreserved, so ``quote(..., safe="")`` + does not percent-encode them; this verifies the table ID is placed + in the path verbatim (a reserved char, if present, would be encoded). + """ httpx_mock.add_response( url="https://connection.keboola.com/v2/storage/branch/1/tables/in.c-bucket-with-dashes.tbl/swap", method="POST", From 7b5e4f2f06070088900ea71f70a584a96da3a381 Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 22:14:33 +0200 Subject: [PATCH 4/6] fix: address Devin review on PR #368 (remove deprecated --hint, add VERSION GATE) 1. clone-table wrongly added --hint support. CONTRIBUTING.md (since v0.45.0) forbids new hints/definitions entries and should_hint(ctx) short-circuits for new commands -- swap-tables (0.28.0, pre-deprecation) was mirrored too literally. Removed the should_hint/emit_hint block from storage_clone_table and the HintRegistry.register entry for storage.clone-table, matching stream (0.50.0) / feature (0.48.0) which carry no hint support. 2. Added the VERSION GATE entry `storage clone-table = 0.52.0+` to keboola-expert.md (CONTRIBUTING.md mandates it for new min-version commands). Freed budget by tightening the #245 line so the prompt stays under its 62000-byte cap. --- plugins/kbagent/agents/keboola-expert.md | 6 +-- src/keboola_agent_cli/commands/storage.py | 10 ----- .../hints/definitions/storage.py | 38 ------------------- 3 files changed, 3 insertions(+), 51 deletions(-) diff --git a/plugins/kbagent/agents/keboola-expert.md b/plugins/kbagent/agents/keboola-expert.md index 6e063e0f..fb10bdfe 100644 --- a/plugins/kbagent/agents/keboola-expert.md +++ b/plugins/kbagent/agents/keboola-expert.md @@ -67,10 +67,10 @@ a critical failure. needed for the current task (e.g. `flow update` needs 0.22.0+, `schedule find` needs 0.23.0+, `config set-default-bucket` needs 0.26.0+, `data-app create / deploy / start / stop / delete / password` - need 0.27.0+, `config update` script[] string-to-array auto-normalize - against #245 trap needs 0.28.0+, list-element re-split against + need 0.27.0+, `config update` script[] auto-normalize (#245) needs + 0.28.0+, list-element re-split against the #274 ODBC `Actual statement count N != desired 1` crash needs - 0.31.0+, `storage swap-tables` needs 0.28.0+, + 0.31.0+, `storage swap-tables` needs 0.28.0+, `storage clone-table` = 0.52.0+, env-var manage-token auth for `org setup` / `project refresh` / `data-app password` needs 0.29.0+ with `--allow-env-manage-token`, `project invite` / `project member-*` / `project invitation-*` diff --git a/src/keboola_agent_cli/commands/storage.py b/src/keboola_agent_cli/commands/storage.py index 2c8cd3a3..c3aa597a 100644 --- a/src/keboola_agent_cli/commands/storage.py +++ b/src/keboola_agent_cli/commands/storage.py @@ -1469,16 +1469,6 @@ def storage_clone_table( kbagent storage swap-tables --project P \\ --table-id in.c-foo.data --target-table-id in.c-foo.data_typed """ - if should_hint(ctx): - emit_hint( - ctx, - "storage.clone-table", - project=project, - table_id=table_id, - branch=branch, - dry_run=dry_run, - ) - formatter = get_formatter(ctx) service = get_service(ctx, "storage_service") config_store: ConfigStore = ctx.obj["config_store"] diff --git a/src/keboola_agent_cli/hints/definitions/storage.py b/src/keboola_agent_cli/hints/definitions/storage.py index 4e265f2e..dfa70417 100644 --- a/src/keboola_agent_cli/hints/definitions/storage.py +++ b/src/keboola_agent_cli/hints/definitions/storage.py @@ -500,44 +500,6 @@ ) ) -# ── storage clone-table ─────────────────────────────────────────── - -HintRegistry.register( - CommandHint( - cli_command="storage.clone-table", - description="Clone (pull) a production table into a dev branch", - steps=[ - HintStep( - comment="Materialize a production table into the dev branch (one-way: default -> branch)", - client=ClientCall( - method="pull_table", - args={ - "table_id": "{table_id}", - "branch_id": "{branch}", - }, - result_var="result", - ), - service=ServiceCall( - service_class="StorageService", - service_module="storage_service", - method="clone_table", - args={ - "alias": "{project}", - "table_id": "{table_id}", - "branch_id": "{branch}", - "dry_run": "{dry_run}", - }, - ), - ), - ], - notes=[ - "Required before swap-tables / column drops on storage-branches projects: a dev branch reads prod tables transparently until first write, so schema mutations need a branch-local copy first.", - "Branch is mandatory (the pull is one-way default -> branch); without it the service raises ConfigError before any HTTP call.", - "Returns a completed storage job dict; the client polls the async job to completion before returning.", - ], - ) -) - # ── storage files ──────��──────────────────────────────────���──────── HintRegistry.register( From 08e35ff1fe485b72da611ccd95bc462a9061ec25 Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 22:32:02 +0200 Subject: [PATCH 5/6] docs: add (since v0.52.0) tag to new gotchas merge section (PR #368 review B-1) The "Dev-branch merge carries only configurations" gotcha used a bare (verified 2026-06-01) stamp. CONTRIBUTING.md (convention #17) requires the (since vX.Y.Z) tag on every gotcha so AI agents don't recommend behavior documentation that predates the install. Now reads "(since v0.52.0, verified 2026-06-01)". --- plugins/kbagent/skills/kbagent/references/gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/kbagent/skills/kbagent/references/gotchas.md b/plugins/kbagent/skills/kbagent/references/gotchas.md index 105c934d..cf982df6 100644 --- a/plugins/kbagent/skills/kbagent/references/gotchas.md +++ b/plugins/kbagent/skills/kbagent/references/gotchas.md @@ -813,7 +813,7 @@ events and emits a final `done` SSE frame mirroring the same record. `branch use`) is set. - Permission class: `write` (creates a branch-local copy; never deletes). -## Dev-branch merge carries only configurations, NOT storage schema (verified 2026-06-01) +## Dev-branch merge carries only configurations, NOT storage schema (since v0.52.0, verified 2026-06-01) - When a dev branch is merged to production, Keboola propagates **configuration** changes only. Physical storage tables -- their From 168feac0809533c8da158d3e1bebe61a28e38013 Mon Sep 17 00:00:00 2001 From: Petr Date: Mon, 1 Jun 2026 22:41:17 +0200 Subject: [PATCH 6/6] docs(expert): add clone-table gotcha + trim stale content (PR #368 NB-1/NB-2/NIT-1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Delta-review follow-up on keboola-expert.md (deferred from this PR for token budget): - NB-1: added a §3 inline gotcha for the storage-branches copy-on-write trap -- an in-branch swap-tables / column-drop fails "bucket not found" until `clone-table` materializes the prod table branch-local. - NIT-1: removed the dangling (§14.3) cross-reference (no such section) and the deprecated --hint alternative in the Retype matrix row. - NB-2: trimmed the verbose semantic-layer "short form" (full prose already lives in gotchas.md) and tightened the auto-materialize entry. Headroom against the 62000-byte budget went from 7 to 609 bytes. --- plugins/kbagent/agents/keboola-expert.md | 54 ++++++++++-------------- 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/plugins/kbagent/agents/keboola-expert.md b/plugins/kbagent/agents/keboola-expert.md index fb10bdfe..e6b04f49 100644 --- a/plugins/kbagent/agents/keboola-expert.md +++ b/plugins/kbagent/agents/keboola-expert.md @@ -147,7 +147,7 @@ a critical failure. | Fetch a specific config | `kbagent config detail --project P --component-id C --config-id K --json` | `tool call get_config` | re-using an earlier JSON dump | | Override the auto-derived output bucket on a config | `kbagent config set-default-bucket --bucket in.c-name` (0.26.0+) -- read-modify-write of `storage.output.default_bucket`, preserves siblings; `--clear` removes it | `kbagent config update --set 'storage.output.default_bucket=in.c-name'` (works pre-0.26.0 but not discoverable) | editing the raw JSON in the UI; full-config replace with `--configuration` (wipes other storage keys) | | Cross-project migration | `kbagent sync pull` + edit files locally + `kbagent sync push --dry-run` | -- | repeated `tool call` loops, one per resource | -| Retype table columns | fetch types via `workspace query`, draft types YAML, write new transformation that produces typed output table, then `kbagent storage swap-tables` (0.28.0+) to flip the typed copy into the original name in any branch | `kbagent --hint client create_table_definition` if the future `storage retype` composite (§14.3) is not yet present | `POST /v2/storage/buckets/.../tables-definition` (REST) followed by manual config rewrites | +| Retype table columns | fetch types via `workspace query`, draft types YAML, write new transformation that produces typed output table, then `kbagent storage swap-tables` (0.28.0+) to flip the typed copy into the original name in any branch | -- | `POST /v2/storage/buckets/.../tables-definition` (REST) followed by manual config rewrites | | Create typed table with native types | `kbagent storage create-table --column pk:VARCHAR(40) --column amount:NUMBER(18,2) --not-null pk --default amount=0` (0.25.0+) | `tool call create_table` (accepts the same `definition.length` shape via MCP) | re-creating via raw REST to `/v2/storage/...tables-definition` | | Promote typed rebuild back into the original name | `kbagent storage swap-tables --project P --table-id in.c-foo.data --target-table-id in.c-foo.data_change_log --branch --yes` (0.28.0+) -- async storage job (`tableSwap`); client polls to completion. Service refuses without a branch; any branch incl. prod | -- | renaming or deleting + re-uploading (loses history; downstream configs need to be rewritten) | | Re-seed a table without losing its schema / PK / dependents | `kbagent storage truncate-table --project P --table-id in.c-foo.data [--branch ID] [--dry-run] [--yes]` (0.32.0+) -- DELETE `/tables/{id}/rows?allowTruncate=1`; endpoint is uniformly async on every branch (returns a queued `tableRowsDelete` job; client polls via `_wait_for_storage_job`). Do NOT pass `async=true` -- the API rejects it. Batch via repeated `--table-id`. Returns `{truncated[], failed[], dry_run, project_alias}` with `truncated[]` entries carrying `{table_id, rows_before, rows_after, branch_id}`. Permission class: `destructive` | `tool call delete_table_rows` if the upstream MCP exposes it | drop + recreate the table (loses descriptions, PK, sharing edges, and breaks every downstream config reference); deleting rows via raw SQL in a workspace (bypasses the Storage API audit trail) | @@ -294,12 +294,16 @@ success, not a failure. plan -- it sets up the user for an impossible step. - **`storage create-table` in a dev branch auto-materializes the bucket** - (0.25.0+): if the target bucket has not been written to in the branch - yet, kbagent creates it there first (mirrors the Go CLI's - `EnsureBucketExists`). The response's `auto_created_bucket: true` is - informational, not an error -- surface it to the user in a write - verification payload but do not treat it as a failure signal. - Production writes never materialize anything. + (0.25.0+): if the target bucket has no branch-local write yet, kbagent + creates it first (mirrors the Go CLI `EnsureBucketExists`). + `auto_created_bucket: true` is informational, not a failure. Production + writes never materialize anything. +- **`storage clone-table` before an in-branch `swap-tables` / column drop** + (0.52.0+): on `storage-branches` projects a dev branch reads prod tables + transparently until first write, so a swap/drop (a write) fails with a + misleading "bucket not found" until the prod table is branch-local. Run + `kbagent storage clone-table --project P --table-id T --branch ` + first (one-way default->branch). See `gotchas.md`. - **`storage truncate-table` is row-only; schema and dependents are preserved** (0.32.0+): the underlying call is @@ -483,31 +487,17 @@ success, not a failure. `--allow-env-manage-token` to their invocation, never strip the warning by suppressing stderr. -- **Semantic-layer gotchas (since v0.41.0)** — five behavior contracts - worth committing to memory before touching `semantic-layer add/edit/ - remove`. Full prose lives in - [`gotchas.md` § Semantic-layer](../skills/kbagent/references/gotchas.md); - the short form: - - **Constraint `rule` is a STRING**, never `{bounds: {min, max}}`. The - sl-builder skill docs are wrong on this. kbagent enforces it. - - **Constraint `name` regex `^[a-z][a-z0-9_]*$`** + the 3-vs-4 - severity split: API `severity` is `error | warning | info` (3-level); - the 4-band health (`_critical / _warning / _healthy / _review`) - lives in the NAME SUFFIX, not on the API. - - **`edit metric --new-name` cascades through every constraint** whose - `metrics[]` referenced the old name, and prints the old/new - CODE_METRIC value. Downstream SQL joining on CODE_METRIC will break - silently — surface the change to the operator. - - **`remove metric` orphans constraints** that reference it. The - pre-deletion scan ALWAYS prints the warning (even with `--yes`); - non-TTY without `--yes` exits 2. Recommended: drop/rewrite the - constraints first, then remove the metric. - - **`build` is a HEURISTIC fallback**, not full AI: one dataset + - one COUNT(*) metric + one glossary entry per table. Response carries - `fallback_used: "heuristic"`. Treat the output as a scaffold and - follow up with `add metric`, `add relationship`, `add constraint`. - The full AI wizard lives in the `sl-build` skill under - `04_AI_Kit/ai-kit/`. +- **Semantic-layer gotchas (since v0.41.0)** — full prose in + [`gotchas.md` § Semantic-layer](../skills/kbagent/references/gotchas.md). + Key traps: constraint `rule` is a STRING (not `{bounds: {min, max}}`); + `severity` is 3-level (`error|warning|info`) while the 4-band health + lives in the name suffix (`_critical/_warning/_healthy/_review`), not the + API; `edit metric --new-name` cascades into constraints' `metrics[]` and + changes CODE_METRIC (surface it -- downstream joins break silently); + `remove metric` orphans referencing constraints (drop/rewrite them + first; the scan warns even with `--yes`, non-TTY exits 2); `build` is a + heuristic scaffold (`fallback_used: "heuristic"`), not the full AI wizard + (that lives in the `sl-build` skill). ---