Skip to content

Consolidate Keboola App skills into dataapp-development#76

Draft
davidesner wants to merge 71 commits into
mainfrom
feat/dataapp-development-skill
Draft

Consolidate Keboola App skills into dataapp-development#76
davidesner wants to merge 71 commits into
mainfrom
feat/dataapp-development-skill

Conversation

@davidesner
Copy link
Copy Markdown
Contributor

Summary

Consolidates two legacy skills (dataapp-dev for Streamlit, dataapp-deployment for Python/JS) into a single new dataapp-development skill at plugins/dataapp-developer/skills/dataapp-development/. The new skill is a router (SKILL.md ≈ 100 lines + 13 references + 5 runnable templates) that covers the full app lifecycle across both app types and all three client paths.

What's covered

  • App types: Streamlit, single-Node + static (the dashboarding default), combined Python+Node
  • Client paths: MCP-only (Claude Desktop / web), Claude Code with filesystem + MCP, kbagent CLI — with sequential detection so an agent always offers every available path instead of silently picking
  • Storage access: read-only workspace + Query Service SDKs (keboola-query-service / @keboola/query-service), DuckDB caching as default for read-only apps, RW Storage Access for writes, input mapping discouraged. BigQuery's workspace_query path documented separately
  • Local dev: proactive .env.local pre-fill from get_project_info, ask user only for KBC_TOKEN, with explicit guidance that narrow-scoped tokens don't work with Query Service
  • Styling: default Keboola palette across all three stacks, "Powered by Keboola" footer with brand-override removal instructions, heavier React+Vite+shadcn option for complex UIs
  • Other: authentication, dashboard patterns, optional Kai chat integration, troubleshooting (10+ live-test-driven entries), dev workflow

Templates

  • templates/streamlit/ — keboola-query-service SDK + Plotly + footer
  • templates/nodejs-app/ — Express + Tailwind CDN + Chart.js + footer (dashboarding default)
  • templates/python-app/ — Flask + uv
  • templates/python-node-app/ — FastAPI backend + Express frontend
  • templates/duckdb-cache/ — Python and Node DuckDB cache helpers

Hard rules

9 SKILL.md hard rules that emerged from live test sessions:

  1. Never commit secrets
  2. RO workspace before input mapping
  3. Apps must handle POST /
  4. No pip install (PEP 668 — use uv)
  5. No [program:nginx] declaration
  6. Validate data first, code second (semantic layer check)
  7. Pick one Keboola path per session (sequential detection, list all candidates)
  8. MCP-only flows: compose source directly into the tool call — don't pre-write a local copy
  9. For local-dev credentials: pre-fill what you can, ask only for what's missing, never grep the filesystem

Validated against

Three end-to-end test sessions against a real Keboola project. Each surfaced a specific footgun that's now codified in the skill (Path A doubled emit, kbagent omission from path detection, legacy workspace-query 404, narrow-scoped token auth error, agent scanning filesystem for tokens).

Removed

  • plugins/dataapp-developer/skills/dataapp-dev/
  • plugins/dataapp-developer/skills/dataapp-deployment/

Test plan

  • Spot-check SKILL.md decision tree routes to the right reference for each task type
  • Verify the three templates that ship a "Powered by Keboola" footer render correctly (nodejs-app, python-node-app, streamlit)
  • One more end-to-end run from a fresh Claude Code session against a real project: build a new Streamlit app, deploy, debug
  • One more end-to-end run for a Python/JS app via the kbagent path
  • Plugin version + marketplace version bumped and root README feature list updated

davidesner added 30 commits May 13, 2026 14:30
Merges dataapp-dev (Streamlit) and dataapp-deployment (Python/JS) into a
single skill covering both app types, three client paths (MCP-only,
Claude Code, kbagent CLI), storage access patterns, authentication,
DuckDB caching, styling defaults, and Kai integration placeholder.
…section

- Remove standalone managed-git-mcp.md reference; the placeholder now lives
  as a subsection in python-js-apps.md alongside the customer-git story.
- Move the row-level user filtering pattern out of authentication.md
  (it's a storage-access concern, already covered there).
- Reference count: 13 -> 12.
Permission scoping by user was sourced from one customer app's CLAUDE.md
("X-Kbc-User-Email injected by Keboola/Nginx") and not corroborated by
official docs. Replace with an explicit data-access-management placeholder
in storage-access.md noting that JS/Python and legacy Streamlit patterns
differ and will be documented once verified.
- Expand single-bullet mention into a full subsection in python-js-apps.md
  covering nginx dual-location, supervisord per-process programs, parallel
  setup.sh, local-dev with frontend-proxy-to-backend, and the pre-built
  frontend convention from profitline-js-app.
- Add templates/python-node-app/ as a fifth template (FastAPI backend +
  Vite/React frontend with full keboola-config wiring).
- Bump template count 4 -> 5 in directory layout and acceptance criteria.
- Reframe Python+Node combined as "when you need it" (Python backend / ML
  / existing codebase), not "most common".
- Promote single Node.js + static frontend (kai-pricing-calculator-app
  nodejs-pricing-simulator branch) as the preferred dashboarding shape:
  one process, no bundler, Chart.js & Tailwind via CDN.
- choosing-app-type.md gets a 3-level decision hierarchy
  (Streamlit -> single Node -> Python+Node).
- styling-guide.md leads with the lightweight CDN stack; React/Vite/shadcn
  is positioned as the heavier-framework alternative.
- templates/nodejs-app/ expanded from hello-world to a real dashboarding
  starter (server.js + api/ + public/ + keboola-config/) modeled on
  kai-pricing.
kai-client is more mature than the placeholder framing suggested: it ships
Python lib with async + SSE + tool approval, Streamlit and JS examples
in-repo, and dedicated kai-dataapp plugin with kai-js / kai-streamlit
skills. The reference now documents:

- Service discovery via Storage API /v2/storage services list
- Auth via x-storageapi-token (no separate Kai token)
- Streamlit embed pattern based on examples/streamlit_app.py
- JS embed pattern based on examples/js-dataapp/server.js (SSE proxy)
- DIY alternative via Anthropic SDK directly (FI app pattern)
- Pointer to kai-client's own plugins for deeper integration work

Open-questions entry updated accordingly.
…yment skills

Their content has been consolidated into the new dataapp-development skill
(SKILL.md + 12 references + 5 templates).
…p-development

- plugin.json + marketplace.json: 1.1.0 -> 1.2.0, updated descriptions.
- Plugin README: rewritten to describe the single dataapp-development skill,
  12 references, and 5 templates.
- Root README: refreshed Data App Developer Plugin feature list.
The skill should distill generic patterns, not cite specific repos the
agent or user has no access to. Removes "Reference app: X" lines,
GitHub/help.keboola.com/pypi/npm URLs, and repo-specific attributions
("Modeled on Y", "Adapted from Z"). Keeps:
- CDN URLs in functional template code (Tailwind, Chart.js)
- localhost / 127.0.0.1 URLs in dev instructions
- Example KBC_URL values in secrets templates
- Library/package names without URLs (kai-client, keboola-query-service, etc.)
davidesner added 26 commits May 15, 2026 11:38
…dation

Some Keboola projects have a semantic layer (metrics, datasets,
glossary, relationships). When it exists and matches the user intent,
the app's query should use those definitions verbatim — not reinvent
the calculation.

dev-workflow.md Validate phase gains a "Semantic layer check (when
available)" sub-step that runs BEFORE the standard schema/data
validation:

  1. search_semantic_context to find relevant metrics/datasets.
  2. get_semantic_context to read the metric SQL and dataset FQNs.
  3. Use the definitions verbatim.
  4. validate_semantic_query before embedding.
  5. Only then query_data to verify.

If no semantic model matches, say so explicitly and proceed with the
standard validation path on raw tables.

SKILL.md Hard rule 6 gets a tail line pointing at the semantic-layer
tools so agents see it from the router without loading dev-workflow.
…latform logs

Streamlit silently swallows uncaught exceptions into its UI without
writing them to stdout/stderr. The MCP get_data_apps log tail and the
Terminal Log tab therefore show nothing for errors that are clearly
visible to the user. Remote debugging fails.

streamlit-apps.md gains a new section "Capturing errors for platform
logs" with the @log_exceptions decorator pattern: catch, log to
stderr with full traceback, then re-raise so Streamlit still shows
the error in the UI. Python/JS frameworks (Flask/FastAPI/Express)
don't need this — their default behavior already logs to stderr.

troubleshooting.md "Reading logs" section is reworked:
- MCP get_data_apps tail is now positioned as the preferred remote
  debugging path (agent-friendly, no UI navigation needed).
- A new "Streamlit-specific footgun" subsection points back at the
  decorator pattern in streamlit-apps.md for agents who reach
  troubleshooting first when the log tail is mysteriously empty.
…ation

Cross-referenced from a parallel PR's production-deployment lessons.

KBC_WORKSPACE_ID section: split guidance by app type.
- Read-only data apps: reuse the MCP session's workspace via the
  workspace_id field returned by mcp__keboola__get_project_info. No
  need to provision a new workspace for local dev.
- Read-write data apps: must create a dedicated local workspace (UI or
  kbagent) with grants matching the direct-grant output mapping. The
  platform's ephemeral production workspace doesn't exist locally.

Storage Access section: add "Bucket stage doesn't restrict writes."
The destination can be in any stage (out., in., otherwise) as long as
the workspace has write privileges — the out. examples are convention,
not a constraint. Confirmed by independent testing in the parallel PR.
Cross-referenced from a parallel PR's production-deployment lessons.

Replaces the previous inline SDK examples with:

1. A Storage wrapper module pattern (Python class, TS module) that
   concentrates env-var reads and Client construction in one place.
   Module-level singleton fails fast on missing env vars; route
   handlers call select(sql) / execute(sql) without touching env or
   the raw Client.

2. A validation module pattern (validation.py, validation.ts) with
   ValidationError, type-coerced parsers, allowlist enforcement, and
   text escaping. The rest of the app routes user input through these
   parsers before SQL interpolation.

3. Five rules of thumb for SQL values (numeric / date / categorical /
   free-text / generated IDs) — concise checklist applicable in any
   language.

The planned SQL.literal() / SQL.ident() / sql.format() SDK helpers
note stays — when those ship, they replace the manual sanitization
pattern.
…errors

streamlit-apps.md §Storage access from Streamlit: add a "Cache the
Storage client across reruns" subsection. Streamlit reruns the script
top-to-bottom on every interaction; without @st.cache_resource the
SDK client is reconstructed each time, re-reading env vars and the
workspace manifest, and opening a new HTTP client. Pair with
@st.cache_data(ttl=60) for cached read results within a session.

troubleshooting.md: add two new Storage-Access-specific entries:

1. KeyError: 'BRANCH_ID' (or other Storage Access env var) on app
   start — Storage Access not enabled on the config, or local .env
   missing the variable. Fix: toggle on + direct-grant output mapping
   in production; add to .env locally.

2. Insufficient privileges / write blocked by the Query Service —
   destination table missing from direct-grant output mapping, or
   local workspace missing grants. Includes the
   bucket-stage-doesn't-matter clarification.
…nt IS the artifact

First test run in Claude Desktop surfaced that the skill's "Local
development" sections led the agent into wasted work: drafting a
.py file in the sandbox FS, then re-emitting the same content as
the modify_data_app source_code argument. The local file was never
the deployment artifact — the tool argument was — so the redraft
doubled output tokens for no benefit.

The fix: distinguish "tool-argument-is-the-artifact" (Path A) from
"local-file-is-the-artifact" (Paths B and C).

deployment-paths.md Path A: new "Don't write the source to a local
file first" subsection. The default is compose-in-tool. The one
legitimate exception is using a sandbox file as a scratchpad for
cheap iterative str_replace edits before a single expensive emit
— but only when the iteration savings beat the redundant emit.
For small apps (<100 lines), compose-in-tool always wins.

deployment-paths.md "How to choose" table: Path A row now warns
agents not to drift into local-dev mode even when a sandbox FS is
available.

streamlit-apps.md §Local development: opener qualifies the section
as Path B/C only. Path A agents should skip it.
Second test run surfaced a real risk: in a Claude Code session with both
a project-local MCP and kbagent (and possibly a global MCP too), the
agent has multiple ways to talk to "the project" — but they may resolve
to different branches or even different projects. Mixing them in one
session produces silent inconsistencies (validate via MCP on branch X,
deploy via kbagent on branch Y, get confusing errors).

deployment-paths.md: new top-level "Pick one path per session — don't
mix" section. Covers:

- Detection: scan tool surface for mcp__*Keboola* (not just .mcp.json
  presence — the MCP config could come from user-level or org-level
  settings too); check kbagent CLI availability with `kbagent project
  list`.
- When multiple paths are present, ask the user upfront which one to
  use, with concrete phrasing.
- Trade-off table: MCP-only iteration loop (modify_data_app + deploy +
  container spin-up + log check) vs kbagent + filesystem + local
  iteration (streamlit run + .env.local). For non-trivial apps the
  local loop is editor-speed; the MCP loop pays the platform spin-up
  cost on every cycle.
- Once chosen, commit — don't use the other path even for unrelated
  operations like data validation.

SKILL.md: new Hard rule 7 carrying the one-line version + pointer to
the full guidance in deployment-paths.md. Surfaces the discipline from
the router so agents see it before loading any reference.
… the only channel to Keboola

The previous Path A intro claimed "no local files, no git, no shell."
That's outdated — Claude Desktop now has a sandbox filesystem, Python
runner, Bash tool. Those tools exist; they just don't connect to
Keboola.

The real constraint is **MCP is the only channel to your Keboola
project**. The sandbox FS is isolated — anything written there
doesn't reach Keboola, it just sits in the agent's workspace. Source
code that should end up in the data app has to go through the
modify_data_app source_code argument; writing to /home/claude/foo.py
first and then re-emitting the same content doubles output tokens
for no benefit.

This pairs with the existing "Don't write the source to a local file
first" footgun guidance (c7e355f) which was added after the first
real-world test surfaced exactly this mistake. The reframed opener
makes the underlying principle visible instead of relying on a
no-longer-true "no filesystem" framing.
…y flows" to hard rule

Three test runs now confirmed: when filesystem + MCP are both
available, the agent's default impulse is "write file → read file →
pass to modify_data_app." The doubled-emit footgun. The Path A intro
already calls this out (commit 2ad97ed) and there's a dedicated
"Don't write the source to a local file first" subsection (commit
c7e355f). But both live in deployment-paths.md, which the agent only
loads if it specifically reads that reference. Tests 2 and 3 both hit
the footgun because the agent never loaded the reference.

Promote the rule to SKILL.md hard rule 8 so it's seen from the
router every time, regardless of which references the agent loads.
… then MCP

Live test: when an agent had multiple MCP servers plus kbagent
available, it asked "Which Keboola MCP should I use?" — limiting the
options to MCP and silently dropping kbagent. The previous wording of
hard rule 7 listed both detection signals but didn't enforce the
order or mandate enumeration before asking.

Rewrite the rule and the deployment-paths.md detection section to:

1. Step 1: run `which kbagent` (Bash). If it exists, run
   `kbagent project list` and capture every alias as a candidate path.
   Explicit "don't skip this just because you've already noticed MCP
   tools" — kbagent is a separate path the user may prefer.
2. Step 2: scan tool surface for mcp__*[Kk]eboola* prefixes. Each
   distinct prefix is a candidate.
3. If the combined list has more than one item, ask the user — list
   ALL of them. Don't omit kbagent just because MCP was found first.

Detection is now a numbered, ordered sequence. Both files updated so
the SKILL.md hard rule and the reference are consistent.
…ever scan

Live test: agents on the kbagent/local-iteration path were greppinng
the filesystem and probing env vars to "discover" Storage tokens.
Auto-mode caught some of these attempts, but not all. The behaviour
is wrong twice over: a security smell (scanning for secrets), AND
unlikely to succeed (the user has to provide tokens regardless).

SKILL.md: new hard rule 9 — for local-dev credentials, ASK the user,
NEVER scan. Lists the workflow: state the required env vars, point
the user at storage-access.md for where to find each, ask them to
populate .env/.env.local, wait for confirmation before running.

storage-access.md §Getting the env vars for local development: add a
prominent "Agent: don't try to discover credentials on your own"
block at the top of the section with a four-step ask-the-user
procedure. Marks the rule as non-negotiable across all paths.
User feedback during live testing: agents tend to pick kbagent
because the skill positions it as "often faster for non-trivial
apps," but the CLI workflow has dangers for non-developer users
(analysts, data scientists, business users) — local env management,
shell command failures, token-shaped errors, debugging CLI output.
For users who just want a working dashboard, MCP is safer.

deployment-paths.md:
- §When more than one is present: the user-question phrasing template
  now annotates MCP-only as "recommended for most users" and
  kbagent + local iteration as "developer-oriented, best for users
  comfortable with the CLI." Audience note follows the example,
  spelling out that non-developers should default to MCP.
- §Path C — CLI agent (kbagent): new opening paragraph marking it as
  developer-oriented and not recommended for non-developers. Added
  "you can handle CLI errors" as an explicit prerequisite.

SKILL.md hard rule 7: adds a clause to the same effect — kbagent is
developer-oriented; prefer MCP for non-developers; surface this in
the user-question so the choice is informed.
…-offs, don't gatekeep

Walking back the "prefer MCP for non-developer users" steering that
the previous commit (c378c90) introduced. The right behaviour isn't
"hide kbagent from non-developers" — it's "always offer both when
both are available, label their costs honestly, let the user decide."
The user knows their own context better than the agent does.

deployment-paths.md:
- §When more than one is present: drop the "(recommended for most
  users)" and "Not recommended for non-developers" labels. Keep the
  factual description of what each path costs the user. Replace the
  audience-note paragraph telling the agent to "pre-pick MCP" with
  the opposite directive: surface the trade-off, don't steer.
- §Path C: open with "Heads-up: this path expects CLI comfort" —
  factual, not gatekeeping. Removed "Not recommended for non-developer
  users" wording.

SKILL.md hard rule 7: same de-escalation. Removed the
"For non-developers, prefer the MCP path" clause. Added explicit
"Don't pre-pick MCP 'to be safe' — the user knows their own context."
…e API workspace endpoint to Query Service

Live test session 6b856018 surfaced the actual failure: the streamlit
template's data_loader.py was POST-ing to
{KBC_URL}/v2/storage/branch/<b>/workspaces/<w>/query and getting back
404 workspace.workspaceNotFound on a Snowflake project. That endpoint
survives only for BigQuery projects today — Snowflake projects must
use the Query Service (https://query.<stack>.keboola.com/api/v1/...).

Templates:
- templates/streamlit/utils/data_loader.py: rewrite to use the official
  keboola-query-service Python SDK. Module-level Client cached with
  @st.cache_resource. Derives QUERY_SERVICE_URL from KBC_URL by
  swapping connection. -> query. when not set. Reads workspace ID
  from KBC_WORKSPACE_MANIFEST_PATH (preferred) or env fallback.
  Explicit errors when BRANCH_ID is missing — the Query Service
  rejects "default" so the value MUST be numeric.
- templates/streamlit/pyproject.toml: replace `requests` dep with
  `keboola-query-service>=0.2.0`.
- templates/streamlit/.streamlit/secrets.toml.example: add BRANCH_ID
  (numeric) and optional QUERY_SERVICE_URL; reference the skill's
  env-vars section for where to find each.
- templates/nodejs-app/api/keboola-client.js: rewrite to use
  @keboola/query-service SDK. Same env-var resolution pattern,
  same workspace-prefix normalization, but the actual call now goes
  through Client.executeQuery against query.<stack>.keboola.com.
- templates/nodejs-app/package.json: add @keboola/query-service dep.

References:
- references/storage-access.md §Direct RO workspace queries: replace
  the "Direct API call shape" section (which showed a raw POST to the
  legacy endpoint) with the Query Service SDK call shape for both
  Python and JS. Adds an explicit "Do NOT post to /v2/storage/.../
  workspaces/<id>/query" warning. Clarifies that the legacy endpoint
  is only used by BigQuery projects today.
- references/troubleshooting.md: new entry "workspace.workspaceNotFound
  404 from /v2/storage/.../workspaces/<id>/query" mirroring the exact
  failure from the live session. Existing "WORKSPACE_<id> prefix" entry
  updated to mention Query Service instead of Storage API.
…full code

Previous edit told agents "don't post to /v2/storage/.../query — use
Query Service" but left BigQuery users with no concrete alternative.
That endpoint IS the right path for BigQuery; Query Service just
doesn't support BQ yet.

Adds two subsections after the Snowflake / Query Service SDK examples:

1. "How to know which backend you're on" — call get_project_info,
   read sql_dialect. Snowflake → Query Service. BigQuery → Storage
   API workspace-query.

2. "BigQuery path — Storage API workspace-query endpoint" — concrete
   Python and JS code. Documents the differences from Query Service:
   - Rows arrive as objects keyed by column name (not arrays + cols).
   - Cell values are native types (no string coercion needed).
   - Synchronous single response; no submit/poll/paginate.
   - BRANCH_ID accepts the string "default" here.
   - Templates are wired for Snowflake; swap data_loader / keboola-client
     and remove the keboola-query-service dep when on BQ.
…idate the skill

Companion to 2026-05-13-dataapp-development-design.md and
2026-05-13-dataapp-development.md (the plan). Lists, by category:

1. Original brief (Obsidian note + Linear AI-3147)
2. Plugin's prior skills that were merged in and deleted
3. Keboola code repos read directly (mcp-server, data-app-python-js,
   both Query Service SDKs, kai-client, kai-pricing-calculator-app,
   profitline-js-app, FI app, agent-usage-data-app, keboola_agent_cli)
   with the specific files inspected for each
4. Connection documentation pages read locally
5. External-team contribution (PR #71) — what was
   adopted and what was deliberately rejected
6. Companion skill keboola-js-data-app — per-item adopt/reject log
7. Live verifications (MCP get_project_info live call + three test
   sessions in data_app_testing) — each test mapped to the commit it
   drove
8. Anthropic / Claude Code platform sources

Not loaded by the skill at runtime — provenance / audit only. Next
iteration uses this to know which sources were authoritative and
which were superseded.
…bute keboola-js-data-app to Fisa, drop private FS paths
…ask only for token, offer to run

JS-app test feedback: when an agent is wiring up local dev, the
ideal UX is "I created .env.local with everything I could resolve;
the only thing missing is your Storage API token — please paste it
here and I'll start the app for you." The previous rule wording
("ask the user to populate .env.local") put the entire job on the
user when MCP can resolve KBC_URL, BRANCH_ID, KBC_WORKSPACE_ID, and
QUERY_SERVICE_URL automatically via get_project_info.

SKILL.md hard rule 9: rewrite as a four-step proactive flow.
- Step 1: pre-create the file with every required key, MCP-resolved
  values pre-filled.
- Step 2: check whether KBC_TOKEN is already set (named-lookup is
  fine; indiscriminate scanning is not).
- Step 3: ask only for the missing token, with a UI-navigation pointer.
- Step 4: offer to run the app once complete.

storage-access.md §Agent: don't try to discover credentials: same
four-step flow. Adds an explicit "the goal is: user provides the one
secret value, the agent does everything else" closing line.
… with Query Service

Live test surfaced the wrong guidance: skill told users to "scope it
minimally" with read access only to the needed buckets/tables. But
the Query Service evaluates access at the workspace level — narrow-
scoped tokens get rejected with auth errors regardless of whether
the SQL touches data inside the scope.

storage-access.md §KBC_TOKEN: rewrite the token-creation guidance.
The token MUST be project-wide. Two options now spelled out:
1. User's master token, refreshed via UI or grabbed via the Keboola
   Dev Tools Chrome extension.
2. Dedicated Storage API token with Full Access to all buckets and
   components.

templates/streamlit/.streamlit/secrets.toml.example: comment header
updated to call out the token-scope requirement and link both
acquisition options. Placeholder renamed from "your-storage-api-
token" to "your-project-wide-storage-api-token" so the constraint is
visible at the value site.

troubleshooting.md: new entry "Query Service auth error with a
narrow-scoped Storage API token" mirroring the live failure mode.
Fix points at the two acquisition options.
… templates

Ship a small, low-contrast attribution footer with the Keboola wordmark in
every template (Streamlit, single-Node + static, combined Python+Node).
Streamlit embeds the SVG as a base64 data URI to avoid needing the
static-serving config flag; the JS templates reference `/keboola-logo.svg`
served from each template's `public/` directory. Document the pattern in
references/styling-guide.md so brand overrides have a clear extension point.
… Keboola footer

Palette overrides don't touch the footer markup — without an explicit
instruction, an agent applying customer branding would ship the app with
both brands stacked. Spell out the removal steps for HTML and Streamlit
templates.
…atch platform-injected env var

The platform injects the workspace ID as `WORKSPACE_ID` (no prefix), per
the official Storage Access docs and the manifest fallback contract. The
skill was telling agents to use `KBC_WORKSPACE_ID`, which the runtime
never sets — production lookups would silently fall back to the manifest
file or fail.

- Rename every occurrence across SKILL.md, references, and templates.
- Drop the redundant `KBC_WORKSPACE_ID || WORKSPACE_ID` fallback chains
  in data_loader.py and keboola-client.js — there's only one name now.
- Add a naming-note callout in storage-access.md §WORKSPACE_ID so anyone
  migrating from older code or earlier drafts of the skill sees the
  correction explicitly.
- Tighten the troubleshooting entry for `WORKSPACE_<id>` values so the
  "env var name vs value prefix" distinction is unambiguous.
The note only existed to justify the previous KBC_WORKSPACE_ID mistake.
A fresh reader doesn't need that context — the section heading and the
rest of the prose already name the env var.
…t practices

Cut redundancy across SKILL.md and references so each file pulls less context
when the skill triggers. Net: -362 lines across the corpus, larger per-task
savings (e.g. troubleshooting.md reads 118 lines instead of 262).

- SKILL.md: hard rules #7–9 collapsed to one-line pointers; redundant
  "Reference index" table dropped (Decision tree already covers all 14 refs).
- troubleshooting.md: every entry rewritten as symptom → cause → pointer;
  duplicate code blocks (POST handlers, nginx snippets, etc.) removed.
- styling-guide.md split: default Keboola palette stays in styling-guide.md;
  bundled React+Vite+shadcn+ECharts stack moved to styling-react-bundled.md
  so RO dashboard tasks don't pull in 220 lines of CSS-variable tokens.
- streamlit-apps.md Theming → points to styling-guide.md for palette/snippets.
- deployment-paths.md Path A auth note → points to authentication.md.
- python-js-apps.md Bootstrap hook + kai-integration.md Pre-built skills
  shortened to one-line forward references.
- ToC blocks added to the seven references >200 lines per Anthropic's
  guidance for partial-read fidelity.
- READMEs updated: "14 topical references" (was 12), styling line clarified.
Captures what the skill is currently working around: blocked Linear
issues (AI-3219 / AI-3218 / PROF-114), Keboola MCP gaps that force
fallbacks to kbagent or filesystem paths, kbagent's missing log
command, deferred placeholder sections, the two Max suggestions not
yet picked up, and live-test coverage gaps. Lives at the skill root
so each item has an obvious owner.
@linear
Copy link
Copy Markdown

linear Bot commented May 18, 2026

AI-3147

@ottomansky
Copy link
Copy Markdown
Collaborator

Hey @davidesner summarizing my findings
Overall this is really solid — I got a working data app running locally in ~15–20 minutes end-to-end, which is genuinely impressive for a first pass. I didn't push it into Keboola, just ran it locally, so this is purely feedback from the build path. A handful of rough edges worth flagging:

Stack selection (App type → Project step)

  • Would be nice if the list sorted by most-used stacks first — I started not sure which one I needed.
  • "AWS US (default)" says connection.keboola.com, but the bundled MCP URL is mcp.us-east4.gcp.keboola.com — a GCP hostname under the AWS US option, which threw me off (is this AWS or GCP?). Worth either tweaking the labeling or adding a note that the MCP runtime lives on GCP regardless of the storage stack.

Bundled MCP URL is US-only

  • plugin.json hardcodes mcp.us-east4.gcp.keboola.com. I'm on GCP EU (europe-west3), so it returned nothing and I fell back to kbagent. Either make it stack-aware (derive from the user's project URL) or call it out in the README so non-US users know to expect the kbagent path.

storage-access.md assumes newer MCP fields

  • It says to call get_project_info and read workspace_id / branch_id, but on keboola-mcp-server v1.32.0 those fields aren't returned — only project_id, project_name, sql_dialect, etc. I had to detour through kbagent branch list + kbagent workspace list to find them. Either bump the minimum MCP version in the skill or document the fallback.

kbagent --json is fragile

  • Every kbagent call prints Updating keboola-mcp-server v1.32.0 -> v1.61.3 (via uv_tool)... to stdout, which breaks json.loads. I had to regex-strip the leading lines before every parse. The upgrade nag should go to stderr or sit behind a --quiet flag, otherwise scripting against --json is painful. -> I will create an issue in https://github.com/padak/keboola_agent_cli and try to create a PR for fixing this or discussing with Padak

kbagent project setup isn't mentioned

  • When I picked the kbagent path, kbagent project list showed 3 unrelated projects, not the one I was working on. The skill never mentioned I might need kbagent project add first. The path-selection step could say: "if you chose kbagent, confirm kbagent project list includes your project; if not, run kbagent project add first."

FQN fallback in api/queries.js

  • The template comment tells you to copy the FQN from mcp__keboola__get_table.fully_qualified_name, but on the older MCP that field doesn't exist (table-detail returned fully_qualified_name: None). I ended up constructing it manually as "<DATABASE>"."<BUCKET>"."<TABLE>" using the database name from kbagent workspace list. The comment should at least mention the manual construction rule as a fallback.

Playwright version pin

  • npx playwright install defaults to latest, but the MCP server pins playwright@1.57.0 (Chromium build v1200), so the first two screenshot attempts failed. Took an npm view @executeautomation/playwright-mcp-server dependencies to figure out the right version. Could either pin it in the setup instructions or auto-detect.

Token-in-chat ergonomics

  • One of the AskUserQuestion options for the token was effectively "paste it here" — and I did, which means it lives in the transcript even after I rotate. Would be cleaner if the agent always pointed at .env first and treated "paste in chat" as the discouraged escape hatch, with a stronger warning. 😬

kbagent project add --url default

  • Defaults to https://connection.keboola.com (legacy AWS US stack), which is wrong for most projects today. Should default to empty/required, or auto-detect from a provided project URL.

None of these blocked me from getting to a working app, but smoothing them out would make the first-run experience a lot tighter — especially for non-US-AWS users. Happy to help test a follow-up pass.

@ottomansky
Copy link
Copy Markdown
Collaborator

Also deployed into keboola overall took 3-5 minutes from the time I asked it on the locally running app CC pointed this out
What this run added to the PR feedback bucket

Three meaningful gaps caught:

  1. templates/nodejs-app/ is not deployable as shipped — validate-repo fails with 3 BLOCKING issues. Missing keboola-config/nginx/sites/default.conf, keboola-config/supervisord/services/app.conf, and pyproject.toml. The python-app and python-node-app
    templates have these; the Node template should too. I copied + adapted from python-app.
  2. pyproject.toml is required even for pure-Node apps — validate-repo --type python-js enforces it. Either the validator should skip this check when no Python is detected, or the docs should call out the stub-pyproject pattern explicitly.
  3. data-app password ergonomics with manage-token blocking — the security model (default-deny KBC_MANAGE_API_TOKEN) is good, but the failure message "No manage token available. Run interactively, or pass --allow-env-manage-token" surfaces only after
    the call, not as a hint when kbagent data-app create --auth password was chosen. Could nudge earlier: "you picked password auth — to fetch the password later, you'll need --allow-env-manage-token or the UI."

@ottomansky
Copy link
Copy Markdown
Collaborator

ottomansky commented May 21, 2026

One more finding is that kbagent currently doesnt have a way to look into terminal logs so it cant debug the apps when there is some issue starting them and gets into loop, already working on a PR for that cc: @jordanrburger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants