Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
722 commits
Select commit Hold shift + click to select a range
6e0159d
fix: default workspace_storage to pvc (storage provisioner working)
Ladas Mar 10, 2026
6ddeb06
fix: add fsGroup to agent pod spec for PVC write access
Ladas Mar 10, 2026
6ff2833
fix: RCA test stats assertion — wait for history load after SPA nav
Ladas Mar 10, 2026
39424f6
fix: portable LOG_DIR in skills — works in sandbox agent containers
Ladas Mar 10, 2026
ac8002b
feat: pass SKILL_REPOS env var to agent deployments
Ladas Mar 10, 2026
adda914
fix: SKILL_REPOS auto-detect from kagenti source repo + branch
Ladas Mar 10, 2026
5948a30
docs: session X passover — reconfigure, micro-reflection, graph fix
Ladas Mar 10, 2026
574b70d
docs: add P0 items 7-9 to session X passover
Ladas Mar 10, 2026
b08b42b
docs: add items 10-12 to passover + visualizations design
Ladas Mar 10, 2026
ffecb90
feat: parameterize RCA test — AGENT_NAME + SKIP_DEPLOY env vars
Ladas Mar 10, 2026
ee6b1ea
fix: don't force-mark loops done on stream end — reload history instead
Ladas Mar 10, 2026
ca44f8d
docs: update passover — budget wizard step, LLM timeout/retry
Ladas Mar 10, 2026
9af444d
docs: add Kiali graph missing services to passover
Ladas Mar 10, 2026
de71685
docs: fix Kiali ambient mesh instructions in passover
Ladas Mar 10, 2026
c79b018
fix: revert history reload in finally — keep SSE-built loop data
Ladas Mar 10, 2026
d098010
docs: add loop_events persistence root cause to passover
Ladas Mar 10, 2026
92f2875
fix: add SSE pipeline logging + revert force-done in finally
Ladas Mar 10, 2026
3f58c7f
docs: session Y passover — rebuild instructions, SSE debugging, P0 list
Ladas Mar 10, 2026
214c924
fix: SSE connection error handling with retry flag
Ladas Mar 10, 2026
74d5101
docs: add items 12-14 + log checking instructions to session Y passover
Ladas Mar 10, 2026
03d7382
docs: update session Y passover with SSE persistence root cause
Ladas Mar 10, 2026
49581c5
fix(backend): recovery polling for incomplete agent loop events
Ladas Mar 10, 2026
9fe8388
fix(backend): run recovery as background task to avoid GeneratorExit
Ladas Mar 10, 2026
d48c2bb
fix(backend): move all persistence to background task
Ladas Mar 10, 2026
bee7626
feat(ui): add micro_reasoning rendering and PromptInspector overlay
Ladas Mar 11, 2026
6112c20
fix(ui): fix TypeScript errors in LoopDetail openInspector
Ladas Mar 11, 2026
381586e
fix(ui): interleave micro-reasoning with tool call/result pairs
Ladas Mar 11, 2026
bb177d7
feat(ui): show token usage and model in micro-reasoning blocks
Ladas Mar 11, 2026
d3abc71
docs: update session Y passover with progress and new P0s
Ladas Mar 11, 2026
1b65917
fix(backend,ui): persist metadata always + prevent double message send
Ladas Mar 11, 2026
7223447
fix(backend,ui): always write metadata + failure reason + verbose log…
Ladas Mar 11, 2026
2ef3b3a
fix(backend): use A2A task ID (not context_id) for recovery polling
Ladas Mar 11, 2026
4f05933
docs: critical finding - A2A SDK overwrites backend metadata
Ladas Mar 11, 2026
8c0eb6c
docs: update passover with metadata race fix and storage design TODO
Ladas Mar 11, 2026
f73e21a
fix(ui): prevent double-send when SSE stream reader fails
Ladas Mar 11, 2026
c6156d1
fix(ui): mark incomplete loops as failed with disconnection reason
Ladas Mar 11, 2026
9fdb0f0
docs: add tasks/resubscribe as proper SSE reconnection fix
Ladas Mar 11, 2026
d80d1b7
feat(backend): reconnect to agent stream via A2A tasks/resubscribe
Ladas Mar 11, 2026
ca23505
feat(backend,ui): subscribe to running sessions via tasks/resubscribe
Ladas Mar 11, 2026
2c31dc0
fix(ui): fix TypeScript errors in subscribe handler
Ladas Mar 11, 2026
b634bf4
fix(ui): cast AgentLoop default to fix TS2322
Ladas Mar 11, 2026
7bf59a5
fix(ui): use unknown cast for AgentLoop default
Ladas Mar 11, 2026
925e73c
feat(ui): tool execution blocks with status icons and call_id pairing
Ladas Mar 11, 2026
99bb825
fix(ui): pass prompt data through reflector and reporter in loopBuilder
Ladas Mar 11, 2026
d04185d
fix(ui): don't reload history immediately after streaming ends
Ladas Mar 11, 2026
34f192d
feat(ui): smooth session loading with parallel fetch and skeleton
Ladas Mar 11, 2026
493c8c0
fix(ui): don't reload history from subscribe done — keep streaming data
Ladas Mar 11, 2026
8d416ac
fix(ui): preserve microReasonings when executor_step replaces a step
Ladas Mar 11, 2026
2d50079
fix(ui): update executor step in-place to preserve chronological order
Ladas Mar 11, 2026
92ec368
fix(ui): show full message preview in expanded PromptBlock
Ladas Mar 11, 2026
853cbd0
feat(ui): PromptBlock opens fullscreen PromptInspector on click
Ladas Mar 11, 2026
dfe27b6
fix(ui): remove unused NestedCollapsible (TS6133)
Ladas Mar 11, 2026
cfdafa9
fix(backend): merge recovery events instead of replacing
Ladas Mar 11, 2026
ebaad06
fix(ui): PromptBlock expand inline + Fullscreen button + portal
Ladas Mar 11, 2026
4b16b4f
docs: add wizard budget controls + step navigation to passover
Ladas Mar 11, 2026
c787ae9
docs: session Z passover for budget enforcement + wizard
Ladas Mar 11, 2026
9b3de8b
fix(ui): handle router event in loopBuilder (suppress unknown warning)
Ladas Mar 11, 2026
c41430a
fix(ui): don't set synthetic finalAnswer on incomplete loops
Ladas Mar 11, 2026
17d0b53
feat(ui): streaming indicator at bottom of agent loop
Ladas Mar 11, 2026
b44e434
fix(ui): remove helperText from FormGroup (PF5 TS error)
Ladas Mar 11, 2026
ba1c8f8
docs: next session passover — step naming, prompt context, test fixes
Ladas Mar 11, 2026
4cd2234
fix(backend): add Istio ambient mesh labels to egress proxy
Ladas Mar 11, 2026
f8ae154
feat(ui): add cancel button for streaming chat requests
Ladas Mar 11, 2026
8f5a91b
fix(deploy): add Istio ambient mesh labels to LiteLLM proxy
Ladas Mar 11, 2026
d656af5
fix(ui): subscribe handler now processes events through applyLoopEvent
Ladas Mar 11, 2026
edc473c
fix(ui): step naming shows plan step number instead of global counter
Ladas Mar 11, 2026
18a97c3
fix(ui): wizard budget step with section grouping and verbose descrip…
Ladas Mar 11, 2026
60b785f
fix(ui): implicit executor steps inherit plan step description
Ladas Mar 11, 2026
0d58494
fix(ui,backend): add rich logging to subscribe and history paths
Ladas Mar 11, 2026
93ebf7b
fix(ui): subscribe stream done marks incomplete loops as failed
Ladas Mar 11, 2026
64d5aa8
fix(ui): don't auto-collapse loop card when agent fails
Ladas Mar 11, 2026
6587659
fix(ui): step label uses loop.currentStep fallback + stats count fix
Ladas Mar 11, 2026
4877a4f
fix(ui): replan updates active plan, step count, and resets currentStep
Ladas Mar 11, 2026
40e89b6
debug(ui): log planner prompt data during loop event processing
Ladas Mar 11, 2026
1df85ce
fix(ui): use current_step (plan index) for loop.currentStep, not step…
Ladas Mar 11, 2026
f477f87
feat(ui): add timestamps to loop steps with hover tooltip
Ladas Mar 11, 2026
7de1d7f
fix(ui): cancel subscribe stream when switching sessions
Ladas Mar 11, 2026
2b541f4
fix(ui): strip duplicate 'Step N' prefix from executor step labels
Ladas Mar 11, 2026
5d1169f
fix(ui): fix Step 1Step 1 duplication in executor label
Ladas Mar 11, 2026
caa48ff
feat(ui): add budget section to SessionStatsPanel with progress bars
Ladas Mar 11, 2026
08dfc30
fix(ui): add debug log to PromptBlock for planner prompt visibility
Ladas Mar 11, 2026
866ad85
test(e2e): add budget and step label assertions to RCA test
Ladas Mar 11, 2026
df6350d
fix(test): add extra Next click for Budget wizard step in PVC test
Ladas Mar 11, 2026
02c1174
fix(ui): normalize plan_step/current_step field, bump default iterati…
Ladas Mar 11, 2026
1c73480
fix(ui): toggle counter shows plan step count, not node visit count
Ladas Mar 11, 2026
55dce09
feat(ui): show plan step + node visit counter on all executor blocks
Ladas Mar 11, 2026
6af792a
fix(ui): remove unused totalIterations variable (TS6133)
Ladas Mar 11, 2026
f50589d
fix(backend,ui): bump recursion_limit default to 2000
Ladas Mar 11, 2026
b880ef3
fix(backend,ui): set recursion_limit default to 300
Ladas Mar 11, 2026
1428490
feat(ui,backend): add Force Tool Calling toggle to wizard
Ladas Mar 11, 2026
f9589c6
feat(ui,backend): add Text Tool Parsing toggle to wizard
Ladas Mar 11, 2026
9a056b3
fix(test): include agent param in SPA nav, increase persistence timeout
Ladas Mar 11, 2026
2928cd5
docs: session Z passover + budget limits design
Ladas Mar 11, 2026
da774af
fix(backend): write-back loop_events to metadata when extracted from …
Ladas Mar 11, 2026
102eb96
fix(ui): new session button clears state, loading overlay on session …
Ladas Mar 11, 2026
1d93d4b
feat(ui,backend): add Debug Prompts toggle (default: on)
Ladas Mar 11, 2026
0c0029b
fix(backend): SQL-based event extraction to prevent OOM
Ladas Mar 11, 2026
31627ce
docs: session Alpha passover from session Z
Ladas Mar 11, 2026
fc8d3c5
feat(ui): render step_selector events as plan transition steps
Ladas Mar 11, 2026
c927ece
fix(ui): stop polling when all loops are done/failed
Ladas Mar 11, 2026
9c5eaa8
feat(ui,backend): incremental polling with events_since + skip_events
Ladas Mar 11, 2026
e7efa7b
fix(ui): stop infinite polling by checking backend task_state
Ladas Mar 11, 2026
912b96c
fix(test): move sidecar/looper tests to sandbox-hardened agent
Ladas Mar 11, 2026
69024ad
fix(test): move agent-resilience test to sandbox-hardened
Ladas Mar 11, 2026
100015c
test(e2e): add budget enforcement and persistence tests
Ladas Mar 11, 2026
cd955b5
fix(test): fix budget test selectors to match working test patterns
Ladas Mar 11, 2026
8204748
fix(ui): match budget_update event type in loopBuilder
Ladas Mar 11, 2026
6e960ae
fix(test): reload page before checking Stats to ensure loop events lo…
Ladas Mar 11, 2026
a5b02b6
fix(ui): include micro-reasoning tokens in loop summary bar total
Ladas Mar 11, 2026
105ba10
test(e2e): add token consistency check between budget and LLM Usage
Ladas Mar 11, 2026
c08af6f
docs: add LiteLLM budget enforcement design doc
Ladas Mar 11, 2026
f252dd1
docs: LLM budget proxy design — per-session token enforcement
Ladas Mar 12, 2026
db29d58
docs: update proxy design — DB provisioning by scripts, not services
Ladas Mar 12, 2026
b4b8e48
docs: DB multi-tenancy design — schema-per-agent isolation
Ladas Mar 12, 2026
c835c70
docs: prefix DB users/schemas with namespace for collision avoidance
Ladas Mar 12, 2026
ad7d3f0
docs: add deterministic DB identifier generation (≤63 chars)
Ladas Mar 12, 2026
ee18418
docs: session Beta passover — LLM proxy + DB multi-tenancy
Ladas Mar 12, 2026
3c3f59a
docs: session Gamma passover — master status of all remaining items
Ladas Mar 12, 2026
607a9b0
docs: add design doc update checklist to Gamma passover
Ladas Mar 12, 2026
13202c7
docs: Gamma passover — design doc rewrite as P0, relative links index
Ladas Mar 12, 2026
75f5ddd
feat(proxy): add LLM Budget Proxy service
Ladas Mar 12, 2026
573b3d2
docs: draft outline for main design doc rewrite
Ladas Mar 12, 2026
7db7574
docs: add design doc rewrite as Beta P0
Ladas Mar 12, 2026
06b9efa
docs: Alpha continuation passover — main design doc rewrite
Ladas Mar 12, 2026
d3015ac
fix(proxy,backend): fix metadata extraction and add proxy-backed stats
Ladas Mar 12, 2026
813a933
fix(test): use SPA navigation instead of page.reload() in budget tests
Ladas Mar 12, 2026
8c77ef6
docs: add sandbox platform design v2 — rewritten architecture doc
Ladas Mar 12, 2026
a2e4bcf
fix(test): add logging to budget tests for debugging navigation
Ladas Mar 12, 2026
7adeb7d
docs: add Delta and Epsilon session passover docs
Ladas Mar 12, 2026
cd2e1dd
docs: add MCP gateway to architecture + Zeta session passover
Ladas Mar 12, 2026
0957e62
fix(proxy,test): add request logging and agent readiness check
Ladas Mar 12, 2026
4329c6b
fix(test): check budget text in Chat tab, not Stats tab
Ladas Mar 12, 2026
0ce01cb
feat(ui): fetch budget data from proxy API instead of loop events
Ladas Mar 12, 2026
795887e
fix(ui): remove unused proxyCalls variable
Ladas Mar 12, 2026
be6e20e
docs: extract composable sandbox security into standalone design doc
Ladas Mar 12, 2026
51255bb
fix(ui): wizard default GitHub secret name to github-token-secret
Ladas Mar 12, 2026
99d3d1e
fix(ui): render user message inside AgentLoopCard instead of separate…
Ladas Mar 12, 2026
14d88b9
fix(ui): add default return to statusLabel switch for TypeScript
Ladas Mar 12, 2026
55d5fcb
fix(ui): expand wizard default proxy domains for gh CLI
Ladas Mar 12, 2026
abf6732
fix(test): use .first() for user message selector in identity test
Ladas Mar 12, 2026
d909835
fix: point sandbox agents to LLM budget proxy instead of direct LiteLLM
Ladas Mar 12, 2026
ba019f5
fix(ui): restore ChatBubble for user messages alongside loop card header
Ladas Mar 12, 2026
9ef30f7
fix(ui): show spinner instead of empty chat during session load
Ladas Mar 12, 2026
d8de919
fix(ui): render micro-reasoning before its tool call, not after
Ladas Mar 12, 2026
6c9d73d
docs: add HITL proper implementation + Pod Events tab design
Ladas Mar 12, 2026
5a98db0
docs: add HITL + Pod Events design to sub-design index
Ladas Mar 12, 2026
40e07bf
docs: expand HITL + Pod Events design with all-pods view and resource…
Ladas Mar 12, 2026
2f2a6ab
feat(ui,backend): add Pod tab showing agent, proxy, and budget pod st…
Ladas Mar 12, 2026
8dbbd07
feat: add pod resource limits to wizard + persist backend memory
Ladas Mar 12, 2026
a41e383
fix(test): lower budget enforcement test to 2000 tokens
Ladas Mar 12, 2026
818e843
fix(test): budget test uses LLM proxy instead of agent env var
Ladas Mar 12, 2026
949eb07
fix(test): set budget on both proxy and agent for UI display parity
Ladas Mar 12, 2026
4b9b1a0
fix(test): budget enforcement test exercises 402 path with 3 follow-ups
Ladas Mar 12, 2026
17d2b45
fix(test): assert budget exceeded text in messages 2 and 3
Ladas Mar 12, 2026
06be279
fix(test): wait for loop card render before checking chat content
Ladas Mar 13, 2026
ca9099a
fix(test): wait for loop card done state, not just input enabled
Ladas Mar 13, 2026
09b77e9
fix(test): poll for active loop status instead of checking hidden
Ladas Mar 13, 2026
5eafb55
fix: ghcr-secret JSON escaping and KEYCLOAK_URL auto-detection
Ladas Mar 13, 2026
f754c4c
fix(test): poll for loop card done + session ID in URL
Ladas Mar 13, 2026
828adda
fix: set KEYCLOAK_VERIFY_SSL=false for OpenShift e2e tests
Ladas Mar 13, 2026
10d27a5
docs: session Alpha (2026-03-13) passover — per-node tools WIP
Ladas Mar 13, 2026
39a6b4b
fix(agent): review fixes for per-node tool subsets graph
Ladas Mar 13, 2026
721808f
fix(ui): default text tool parsing to off in wizard
Ladas Mar 13, 2026
7acfd06
docs: session Alpha-1 passover — per-node tools + debugging scripts
Ladas Mar 13, 2026
3ccdb3f
fix(backend): lower incremental persist threshold for SSE events
Ladas Mar 13, 2026
85c9087
fix(test): wait for agent loop completion before navigating
Ladas Mar 13, 2026
1616100
fix(ui): Graph Node Visits reads event_index instead of step
Ladas Mar 13, 2026
a065c39
fix(ui): node visits badge, file preview links, event ordering
Ladas Mar 13, 2026
eac3832
fix(ui): replace [N] step visit text with PatternFly Badge
Ladas Mar 13, 2026
4851a0c
fix(ui): uniform badge layout for all step headers
Ladas Mar 13, 2026
3f7df76
fix(ui): index badge + node type badge with hover description
Ladas Mar 13, 2026
4dd4b3f
fix(ui): use PF theme variables for index badge dark/light mode
Ladas Mar 13, 2026
be1b257
feat(ui): file preview badges, plan always visible, dark mode fixes
Ladas Mar 13, 2026
44a57f3
fix(ui): remove duplicate plan section from LoopDetail
Ladas Mar 13, 2026
b6d088e
feat(ui): collapsible step sections with tool call summary
Ladas Mar 13, 2026
8edac28
fix(ui): step badge in collapsible header, hide redundant inner header
Ladas Mar 13, 2026
527ae8c
docs: session gamma passover — context isolation + node visit model
Ladas Mar 13, 2026
dacc4fd
fix(skill): add gh CLI flag reference to rca:ci skill
Ladas Mar 13, 2026
ffa48c4
feat(ui): handle replanner_output, preserve current step on replan
Ladas Mar 13, 2026
ce9d68e
docs: update gamma passover with late session fixes
Ladas Mar 13, 2026
711cfcb
feat(ui): loopBuilder groups events by node_visit instead of step
Ladas Mar 13, 2026
e412ec6
fix(backend): persist every event immediately — no batching
Ladas Mar 14, 2026
fbc6f9a
docs: update gamma2 passover with final results
Ladas Mar 14, 2026
aa9f09f
feat(test): RCA test supports RCA_FORCE_TOOL_CHOICE=0 variant
Ladas Mar 14, 2026
31f1272
feat(ui): show bound tools in prompt inspector, always visible
Ladas Mar 14, 2026
80256a4
docs: alpha3 passover — streaming architecture, event pipeline, persi…
Ladas Mar 14, 2026
50235c5
fix(backend+test): persist logging + wizard toggle for RCA variants
Ladas Mar 14, 2026
d22133d
fix(backend): bump egress proxy default memory to 256Mi
Ladas Mar 14, 2026
a700c3a
fix(backend): add SSE stream exhaustion logging
Ladas Mar 14, 2026
dc43462
fix(backend): remove global httpx read timeout for SSE streaming
Ladas Mar 14, 2026
d3de7a5
fix(test): navigate to Observability step for Force Tool toggle
Ladas Mar 14, 2026
aedb851
fix(test): correct wizard step order, assert Observability step
Ladas Mar 14, 2026
849c60b
fix(test): use label click for PF Switch toggle (overlay blocks check…
Ladas Mar 14, 2026
278e2d0
fix(test): use .first() for PF Switch label (resolves to 2 elements)
Ladas Mar 14, 2026
905a724
fix(wizard+test): auto-enable text parsing when force tools is OFF
Ladas Mar 14, 2026
ab43253
refactor(ui+backend): remove legacy event type handling
Ladas Mar 14, 2026
0b53bb4
fix(wizard+test): don't auto-enable text parsing, use wizard defaults
Ladas Mar 14, 2026
593915d
fix(wizard+backend): bump egress proxy defaults to 256Mi/200m
Ladas Mar 14, 2026
5cba051
fix(backend): patch existing deployments on redeploy (was skip on 409)
Ladas Mar 14, 2026
3a3beeb
test(e2e): agent redeploy test — verify limits update on reconfigure
Ladas Mar 14, 2026
1abbda1
fix(ui): filter unknown event types in loopBuilder
Ladas Mar 14, 2026
18cf70b
fix(backend): route agent LLM calls through budget proxy
Ladas Mar 14, 2026
8e7ea5c
fix(ui): pass bound_tools from micro_reasoning events to inspector
Ladas Mar 14, 2026
ff95a57
fix(ui): hide welcome card when messages exist, remove bot icon
Ladas Mar 14, 2026
84ea59e
docs: alpha3 passover — P0 unified invoke_llm + tool loop for all nodes
Ladas Mar 14, 2026
def1f72
feat(ui): thinking iterations UI + cancel button + wizard budget fields
Ladas Mar 14, 2026
47a6706
fix(ui): fix JSX nesting in welcome card conditional
Ladas Mar 14, 2026
7c6d23a
docs: update alpha4 passover with PlanStore and remaining items
Ladas Mar 14, 2026
35a3692
docs: session eta passover — chat view modes (simple/advanced/graph)
Ladas Mar 14, 2026
c9da14f
feat(ui): add chat view modes — simple, advanced, graph
Ladas Mar 14, 2026
b6972a1
fix(ui): add descriptive title attributes to all rendered badges
Ladas Mar 14, 2026
c2a1493
feat(ui): budget polling, file refresh, files-touched display, thinki…
Ladas Mar 14, 2026
fe28dcf
fix(ui): eliminate flicker on session reload + graph fullscreen button
Ladas Mar 14, 2026
e447fff
fix(ui): count user messages from loop.userMessage in stats panel
Ladas Mar 14, 2026
280f61d
feat(ui): graph view node drill-down with breadcrumb navigation
Ladas Mar 14, 2026
7535867
fix(ui): eliminate empty message flash on session load
Ladas Mar 14, 2026
1c1d35a
fix(ui): prevent poll from re-adding user messages when loops exist
Ladas Mar 14, 2026
70976b0
fix(ui): render user messages from loop.userMessage on history load
Ladas Mar 14, 2026
46ae399
fix(ui): timestamp type must be Date not string
Ladas Mar 14, 2026
8dfd575
fix(ui): add missing order field to synthetic user message
Ladas Mar 14, 2026
1f96704
feat(ui): extract file paths from reporter content as badges
Ladas Mar 14, 2026
6f1753a
fix(backend): raise wizard defaults to match agent limits
Ladas Mar 14, 2026
c6a85ec
fix(ui): sort user messages by order before loop pairing
Ladas Mar 14, 2026
151ce62
feat(ui): extract history pairing into testable utility (9 tests)
Ladas Mar 14, 2026
b89fcd0
fix(ui): fix userMsgs reference after pairing refactor
Ladas Mar 15, 2026
9e83dc1
feat: LLM virtual key management API + litellm integration
Ladas Mar 15, 2026
c2b0d91
fix(ui): pass all prompt fields (boundTools, llmResponse) for all nod…
Ladas Mar 15, 2026
1cde5e4
fix(backend): idempotent key creation + fix litellm response parsing
Ladas Mar 15, 2026
6d504cc
feat: model selector UI + wizard allowed models + setup script
Ladas Mar 15, 2026
fa941ce
feat(backend): per-node LLM model overrides in wizard deployment
Ladas Mar 15, 2026
b8ac5e6
fix(test): set up LiteLLM port-forward for E2E tests
Ladas Mar 15, 2026
132c6e0
fix(deploy): point test agents directly at litellm (no budget proxy)
Ladas Mar 15, 2026
39f095a
feat: deploy llm-budget-proxy as part of team namespace provisioning
Ladas Mar 15, 2026
8230b0b
fix(deploy): correct budget proxy DATABASE_URL password
Ladas Mar 15, 2026
3850d35
fix(backend): guard against None entries in session history
Ladas Mar 15, 2026
a1689b2
fix(backend): handle metadata=None in history (not just missing key)
Ladas Mar 15, 2026
d34cb73
fix(ui): remove helperText prop (not in PatternFly v5 FormGroup)
Ladas Mar 15, 2026
3da0c52
feat: AgentGraphCard UI hook, test fixes, OTel wizard toggle
Ladas Mar 15, 2026
8e5b8df
fix(ui): TypeScript errors in useSessionLoader and unused import
Ladas Mar 15, 2026
f44f402
fix: assertive test assertions + wizard reconfigure navigation
Ladas Mar 15, 2026
29f7359
fix(ui): update npm deps — fix Playwright SSL verification vulnerability
Ladas Mar 15, 2026
9f18ed1
fix(ui): bump minimatch override to 3.1.4, fix ReDoS vulnerability
Ladas Mar 15, 2026
9769171
fix(helm): add PVC permissions to kagenti-backend ClusterRole
Ladas Mar 15, 2026
fd12223
fix: LLM secret key resolution, Playwright pin, default secret name
Ladas Mar 15, 2026
e6898dd
docs: session beta passover + theta squid proxy design
Ladas Mar 15, 2026
50bd048
fix: proxy env vars conditional, workspace_storage in RCA test
Ladas Mar 15, 2026
1c22db0
feat(ui): wire useSessionLoader into SandboxPage
Ladas Mar 15, 2026
ec1965f
fix(deploy): default proxy=True in wizard (was False)
Ladas Mar 15, 2026
dfe3d78
fix(helm): add pods/exec, configmap/secret write to backend RBAC
Ladas Mar 15, 2026
996ae1b
fix(ui): default proxy=true in wizard INITIAL_STATE
Ladas Mar 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .claude/skills/ci:monitoring/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Monitor running CI pipelines and report results. Creates task items for each CI
**CI log downloads MUST go to files.** Status checks (`gh pr checks`) are small and OK inline.

```bash
export LOG_DIR=/tmp/kagenti/ci/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-ci}"
mkdir -p "$LOG_DIR"

# When downloading logs after completion:
gh run view <run-id> --log-failed > $LOG_DIR/ci-run-<run-id>.log 2>&1; echo "EXIT:$?"
Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/ci:status/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Check the current CI status for a PR and create task items for any failures.
`gh run view --log-failed` and artifact downloads MUST redirect:

```bash
export LOG_DIR=/tmp/kagenti/ci/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-ci}"
mkdir -p "$LOG_DIR"

# Small output OK inline:
gh pr checks <PR-number>
Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/github:pr-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ comments, and posts a GitHub review after user approval.
PR diffs can be very large. **Always redirect diff output to files and analyze with subagents.**

```bash
export LOG_DIR=/tmp/kagenti/review/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-review}"
mkdir -p "$LOG_DIR"
```

Small output OK inline: `gh pr checks`, `gh pr view --json` (metadata only).
Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/helm:debug/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ description: Debug Helm chart issues - template rendering, value overrides, hook
**Helm template output can be hundreds of lines.** Always redirect to files:

```bash
export LOG_DIR=/tmp/kagenti/helm/${WORKTREE:-$(basename $(git rev-parse --show-toplevel))}
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-helm}"
mkdir -p "$LOG_DIR"

# Redirect helm template output
helm template kagenti charts/kagenti -n kagenti-system > $LOG_DIR/rendered.yaml 2>&1 && echo "OK" || echo "FAIL"
Expand Down
4 changes: 2 additions & 2 deletions .claude/skills/kagenti:deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ This skill guides you through deploying or redeploying the Kagenti Kind cluster
**Deploy scripts produce hundreds of lines.** Always redirect to files:

```bash
export LOG_DIR=/tmp/kagenti/deploy/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-deploy}"
mkdir -p "$LOG_DIR"

# Pattern: redirect deploy output
./.github/scripts/local-setup/kind-full-test.sh ... > $LOG_DIR/deploy.log 2>&1; echo "EXIT:$?"
Expand Down
11 changes: 6 additions & 5 deletions .claude/skills/kagenti:operator/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Deploy and manage Kagenti operator, agents, and tools on Kubernetes clusters.
**Deploy/build commands produce large output.** Always redirect to files:

```bash
export LOG_DIR=/tmp/kagenti/deploy/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-deploy}"
mkdir -p "$LOG_DIR"

# Pattern: redirect build/deploy output
command > $LOG_DIR/<name>.log 2>&1; echo "EXIT:$?"
Expand Down Expand Up @@ -173,14 +173,15 @@ kubectl get crd | grep kagenti
# All components
kubectl get components -A

# Agent builds
kubectl get agentbuilds -A
# Shipwright builds
kubectl get builds -A
kubectl get buildruns -A

# Deployments
kubectl get deployments -n team1
```

### Check Tekton Pipelines
### Check Shipwright/Tekton Pipelines

```bash
# Pipeline runs
Expand Down
15 changes: 13 additions & 2 deletions .claude/skills/rca/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ the main conversation context.

```bash
# Session-scoped log directory
export LOG_DIR=/tmp/kagenti/rca/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-rca}"
mkdir -p "$LOG_DIR"
```

**Rules:**
Expand Down Expand Up @@ -109,10 +109,21 @@ After RCA is complete, switch to TDD for fix iteration: ◄──┘┘ │
> Before routing to `rca:kind`, run `kind get clusters` — if a cluster exists from another session,
> route to `rca:ci` instead or ask the user.

## CVE Awareness

All RCA variants include a CVE check before publishing findings. If the root
cause involves a dependency issue, `cve:scan` runs automatically to check for
known CVEs. If found, `cve:brainstorm` blocks public disclosure until the CVE
is properly reported through the project's security channels.

See `cve:scan` and `cve:brainstorm` for details.

## Related Skills

- `tdd:ci` - Fix iteration after RCA (CI-driven)
- `tdd:hypershift` - Fix iteration with live cluster
- `tdd:kind` - Fix iteration on Kind
- `k8s:logs` - Query and analyze component logs
- `k8s:pods` - Debug pod issues
- `cve:scan` - CVE scanning gate
- `cve:brainstorm` - CVE disclosure planning
52 changes: 49 additions & 3 deletions .claude/skills/rca:ci/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ can dump thousands of lines into context. ALL CI log analysis MUST happen in sub

```bash
# Session-scoped log directory
export LOG_DIR=/tmp/kagenti/rca/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
# Works in both Claude Code (local) and sandbox agent (container)
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-rca}"
mkdir -p "$LOG_DIR"
```

**Rules:**
Expand Down Expand Up @@ -175,6 +176,28 @@ grep -i "oom\|memory\|evict\|limit" logs/*.txt
[How to prevent recurrence]
```

### CVE Check Before Publishing Findings

**Before posting RCA findings to any public destination** (issue comment, PR comment, etc.):

If the root cause involves a dependency bug, unexpected behavior, or version issue:

1. Invoke `cve:scan` to check if this is a known CVE
2. If a CVE is found → invoke `cve:brainstorm` **BEFORE** documenting findings publicly
3. Rewrite RCA documentation to use neutral language (no CVE IDs, no vulnerability descriptions)
4. Report the CVE through proper channels (see `cve:brainstorm`)

Example neutral RCA wording:
```
Root Cause: Incompatibility with <package> <version>.
Fix: Bump to <version> which resolves the behavior.
```

NOT:
```
Root Cause: CVE-2026-XXXXX in <package> allows remote code execution.
```

## Escalation to rca:hypershift

Escalate when:
Expand All @@ -187,11 +210,32 @@ Escalate when:
rca:ci inconclusive? → Create cluster → rca:hypershift
```

## gh CLI Flag Reference (use ONLY these — do NOT invent flags)

### `gh run list`
Valid: `--branch <name>`, `--status <state>`, `--event <type>`, `--limit <n>`,
`--workflow <name>`, `--json <fields>`, `--commit <sha>`
INVALID (do NOT use): `--head`, `--head-ref`, `--pr`, `--pull-request`
To filter by PR: use `gh pr checks <pr-number>` or `--branch <pr-branch-name>`

### `gh run view <run_id>`
Valid: `--log`, `--log-failed`, `--job <id>`, `--web`
Always redirect large output: `gh run view <id> --log-failed > $LOG_DIR/ci.log`

### `gh pr`
- `gh pr checks <number>` — CI status for a specific PR
- `gh pr view <number> --json checks` — JSON CI check data
- `gh pr list --state open|closed|merged`

### If a flag fails
Run `gh <command> --help` to see valid flags. Do NOT guess.

## Quick Reference

| Task | Command |
|------|---------|
| List failed runs | `gh run list --status failure` |
| List failed runs | `gh run list --status failure --limit 5` |
| CI for specific PR | `gh pr checks <pr-number>` |
| View failed logs | `gh run view <id> --log-failed` |
| Download artifacts | `gh run download <id>` |
| Open in browser | `gh run view <id> --web` |
Expand All @@ -201,3 +245,5 @@ rca:ci inconclusive? → Create cluster → rca:hypershift
- `rca:hypershift` - RCA with live cluster access
- `tdd:ci` - Fix iteration after RCA
- `superpowers:systematic-debugging` - General debugging approach
- `cve:scan` - CVE scanning (check if root cause is a known CVE)
- `cve:brainstorm` - Disclosure planning (if CVE found during RCA)
16 changes: 14 additions & 2 deletions .claude/skills/rca:kind/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Root cause analysis workflow for failures on local Kind clusters.
**All diagnostic commands MUST redirect output to files.**

```bash
export LOG_DIR=/tmp/kagenti/rca/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-rca}"
mkdir -p "$LOG_DIR"
```

**Rules:**
Expand Down Expand Up @@ -112,6 +112,16 @@ After fixing, re-run the specific failing test:
uv run pytest kagenti/tests/e2e/ -v -k "test_name" > $LOG_DIR/retest.log 2>&1; echo "EXIT:$?"
```

### CVE Check Before Publishing Findings

**Before posting RCA findings to any public destination:**

If the root cause involves a dependency bug or version issue:

1. Invoke `cve:scan` to check if this is a known CVE
2. If a CVE is found → invoke `cve:brainstorm` BEFORE documenting publicly
3. Use neutral language in all public documentation

## Kind-Specific Issues

| Issue | Cause | Fix |
Expand All @@ -135,3 +145,5 @@ If the issue can't be reproduced locally, escalate:
- `kind:cluster` - Create/destroy Kind clusters
- `k8s:pods` - Debug pod issues
- `kagenti:ui-debug` - Debug UI issues (502, API, proxy)
- `cve:scan` - CVE scanning (check if root cause is a known CVE)
- `cve:brainstorm` - Disclosure planning (if CVE found during RCA)
16 changes: 9 additions & 7 deletions .claude/skills/tdd/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,8 +320,8 @@ and being re-read on every subsequent turn.

```bash
# Session-scoped log directory — ALWAYS set before running commands
export LOG_DIR=/tmp/kagenti/tdd/$WORKTREE # or $(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-tdd}"
mkdir -p "$LOG_DIR"
```

**Rules:**
Expand All @@ -342,10 +342,11 @@ All three flows eventually enter this loop:
3. test:review — verify test quality (no silent skips, assertive)
4. test:run-kind or test:run-hypershift — execute tests (output to $LOG_DIR)
5. Track progress — compare test results with previous run
6. git:commit — commit with proper format
7. git:rebase — rebase onto upstream/main
8. Push → ci:monitoring — wait for CI results
9. CI passes? → Handle reviews (Flow 2 Step 4). CI fails? → Back to step 1.
6. cve:scan — scan for CVEs before pushing (BLOCKS if found)
7. git:commit — commit with proper format
8. git:rebase — rebase onto upstream/main
9. Push → ci:monitoring — wait for CI results
10. CI passes? → Handle reviews (Flow 2 Step 4). CI fails? → Back to step 1.
```

## Commit Policy
Expand Down Expand Up @@ -394,5 +395,6 @@ Commit 3: 11 pass, 2 fail ← good, +1 passing
- `git:commit` - Commit with proper format
- `git:rebase` - Rebase before pushing
- `git:worktree` - Create isolated worktrees
- `git:commit` - Commit format and conventions
- `repo:pr` - PR creation conventions
- `cve:scan` - CVE scanning gate
- `cve:brainstorm` - CVE disclosure planning
28 changes: 24 additions & 4 deletions .claude/skills/tdd:ci/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ description: CI-driven TDD workflow - commit, local checks, push, wait for CI, i
- [Phase 1: Brainstorm](#phase-1-brainstorm-new-features)
- [Phase 2: Commit](#phase-2-commit)
- [Phase 3: Local Checks](#phase-3-local-checks)
- [Phase 3.5: CVE Gate](#phase-35-cve-gate)
- [Phase 4: Push to PR](#phase-4-push-to-pr)
- [Phase 5: Wait for CI](#phase-5-wait-for-ci)
- [Phase 6: Analyze Failures](#phase-6-analyze-failures)
Expand All @@ -33,8 +34,8 @@ Iterative development workflow using CI as the test environment. Commit changes,

```bash
# Session-scoped log directory — use worktree name to avoid collisions
export LOG_DIR=/tmp/kagenti/tdd/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-tdd}"
mkdir -p "$LOG_DIR"
```

### Key Patterns
Expand Down Expand Up @@ -94,7 +95,10 @@ flowchart TD
P1 --> P2["Phase 2: Commit"]:::git
P2 --> P3["Phase 3: Local Checks"]:::test
P3 -->|Checks fail| P2
P3 -->|Checks pass| P4["Phase 4: Push to PR"]:::git
P3 -->|Checks pass| P3B["Phase 3.5: CVE Gate"]:::cve
P3B -->|Clean| P4["Phase 4: Push to PR"]:::git
P3B -->|CVE found| CVE_HOLD["cve:brainstorm (BLOCKS push)"]:::cve
CVE_HOLD -->|Resolved| P4
P4 --> P5["Phase 5: Wait for CI"]:::ci
P5 --> RESULT{"CI Result?"}

Expand All @@ -119,6 +123,7 @@ flowchart TD
classDef hypershift fill:#3F51B5,stroke:#333,color:white
classDef ci fill:#2196F3,stroke:#333,color:white
classDef test fill:#9C27B0,stroke:#333,color:white
classDef cve fill:#D32F2F,stroke:#333,color:white
```

> Follow this diagram as the workflow.
Expand Down Expand Up @@ -328,6 +333,19 @@ uv run pytest kagenti/tests/ -v --ignore=kagenti/tests/e2e > $LOG_DIR/unit-tests

**Fix any failures before pushing.** On failure, use `Task(subagent_type='Explore')` to read the log file.

## Phase 3.5: CVE Gate

**MANDATORY before pushing.** Scan dependencies for known CVEs before any public output.

Invoke `cve:scan` on the working tree:

1. If `cve:scan` returns clean → proceed to Phase 4
2. If `cve:scan` finds HIGH/CRITICAL CVEs → `cve:brainstorm` activates a CVE hold
- **Do NOT proceed to Phase 4** until the hold is resolved
- The hold blocks `gh pr create`, `gh pr comment`, `gh issue create/comment`
- Silent fixes (dependency bumps with neutral commit messages) are allowed
- See `cve:brainstorm` for resolution steps

## Phase 4: Push to PR

```bash
Expand Down Expand Up @@ -472,7 +490,7 @@ After **3+ failed CI iterations**, consider switching to `tdd:hypershift` for re

```bash
# Check if cluster exists for current worktree
WORKTREE=$(basename $(git rev-parse --show-toplevel))
WORKTREE=$(basename "${WORKSPACE_DIR:-$(pwd)}")
ls ~/clusters/hcp/kagenti-hypershift-custom-*/auth/kubeconfig 2>/dev/null
```

Expand Down Expand Up @@ -598,3 +616,5 @@ This is optional but recommended for tracking development effort.
- `git:commit` - Commit format and conventions
- `git:rebase` - Rebase onto upstream main
- `session:post` - Post session analytics to PR
- `cve:scan` - CVE scanning gate (Phase 3.5)
- `cve:brainstorm` - CVE disclosure planning (if CVEs found)
27 changes: 23 additions & 4 deletions .claude/skills/tdd:kind/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,11 @@ flowchart TD
GUARD -->|Cluster exists, mine| REUSE["Reuse existing cluster"]:::k8s
GUARD -->|Cluster exists, not mine| STOP([Stop - another session owns it])

CREATE --> ITER
REUSE --> ITER
CREATE --> CVEGATE["CVE Gate: cve:scan"]:::cve
REUSE --> CVEGATE
CVEGATE -->|Clean| ITER
CVEGATE -->|CVE found| CVE_HOLD["cve:brainstorm"]:::cve
CVE_HOLD -->|Resolved| ITER

ITER{"Iteration level?"}
ITER -->|Level 1| L1["Test only (fastest)"]:::test
Expand Down Expand Up @@ -70,10 +73,24 @@ flowchart TD
classDef hypershift fill:#3F51B5,stroke:#333,color:white
classDef ci fill:#2196F3,stroke:#333,color:white
classDef test fill:#9C27B0,stroke:#333,color:white
classDef cve fill:#D32F2F,stroke:#333,color:white
```

> Follow this diagram as the workflow.

## CVE Gate (Pre-Deploy)

**MANDATORY before deploying to Kind cluster.**

Invoke `cve:scan` on the working tree before the first deployment:

1. If `cve:scan` returns clean → proceed to iteration selection
2. If `cve:scan` finds HIGH/CRITICAL CVEs → `cve:brainstorm` activates a CVE hold
- Silent fixes (dependency bumps) are allowed
- Deployment proceeds only after hold is resolved

This gate runs once per session, not on every iteration.

## Key Principle

**Match CI exactly**: Kind tests must use the same packages as CI to avoid version mismatches. CI uses `pip install` (gets latest versions), local uses `uv` (locked versions). Always verify package versions match.
Expand All @@ -84,8 +101,8 @@ flowchart TD

```bash
# Session-scoped log directory — use worktree name to avoid collisions
export LOG_DIR=/tmp/kagenti/tdd/$(basename $(git rev-parse --show-toplevel))
mkdir -p $LOG_DIR
export LOG_DIR="${LOG_DIR:-${WORKSPACE_DIR:-/tmp}/kagenti-tdd}"
mkdir -p "$LOG_DIR"
```

### Log Analysis Rule
Expand Down Expand Up @@ -255,3 +272,5 @@ This is optional but recommended for tracking development effort.
- `test:review` - Review test quality
- `git:commit` - Commit format
- `session:post` - Post session analytics to PR
- `cve:scan` - CVE scanning gate (pre-deploy)
- `cve:brainstorm` - CVE disclosure planning (if CVEs found)
Loading