Skip to content

feat(kilo-chat): rip out stream chat#2907

Open
iscekic wants to merge 279 commits intomainfrom
feat/kilo-chat-migration-pr1
Open

feat(kilo-chat): rip out stream chat#2907
iscekic wants to merge 279 commits intomainfrom
feat/kilo-chat-migration-pr1

Conversation

@iscekic
Copy link
Copy Markdown
Contributor

@iscekic iscekic commented Apr 29, 2026

Summary

This PR completes the Stream Chat cutover to first-party Kilo Chat across web, mobile, shared packages, and the Worker services that back KiloClaw chat. It introduces the Kilo Chat service as the canonical chat API, Event Service as the realtime/presence transport, Notifications as the badge/push owner, and KiloClaw runtime/plugin changes needed to run the new chat path.

High-level breaking backwards compatibility changes

  • Stream Chat support is removed from web, mobile, KiloClaw, and notifications. Legacy Stream Chat credential/send routes, web tRPC methods, webhooks, package dependencies, patches, and runtime config must be migrated to Kilo Chat.
  • Event Service WebSocket auth changes from JWT-in-subprotocol to POST /connect-ticket followed by /connect?ticket=... with the kilo.events.v1 subprotocol.
  • Kilo Chat auth now accepts environment-bound Kilo JWTs that match the current user pepper.
  • Kilo Chat request/response/event contracts are richer and stricter: mark-read requires lastSeenMessageId, delete-style mutations return JSON instead of 204, and strict clients must handle canonical message, conversation, cursor, reply, action, and reaction-operation fields.
  • Notification routing moves from instance/channel Stream Chat payloads to sandboxId/conversationId Kilo Chat payloads. The old SQL channel_badge_counts table is dropped and unread badge state moves to per-user Notification DO buckets without a backfill.
  • Services should be deployed together because new env vars, DO bindings, RPC bindings, and shared package contracts are coupled across Kilo Chat, Event Service, Notifications, KiloClaw, web, and mobile.

Web app

  • Replaces the Stream Chat Claw chat page with the Kilo Chat conversation UI for personal and organization chat.
  • Adds /claw/chat/[conversationId] and /organizations/[id]/claw/chat/[conversationId] routes, with guards for missing instances and mismatched sandbox conversations.
  • Moves chat/event wiring into a shared authenticated EventServiceProvider, reusing one Event Service and Kilo Chat client across the app.
  • Adds platform, instance, and conversation presence subscriptions gated by page visibility where appropriate.
  • Improves chat UX with conversation creation loading/errors, status retry UI, safer message action availability, edit/send length handling, read-marker retries, and scroll anchoring.
  • Removes Stream Chat dependencies, Stream Chat credential routes, and legacy KiloClaw chat send/credential tRPC methods in favor of Kilo Chat token issuance.

Compatibility notes:

  • /claw/chat and /organizations/[id]/claw/chat now render Kilo Chat instead of the old Stream Chat channel UI.
  • Legacy Kilo Chat routes redirect from /claw/kilo-chat and /claw/kilo-chat/:conversationId to /claw/chat and /claw/chat/:conversationId, with equivalent organization redirects.
  • Removed web contracts: GET /api/kiloclaw/chat-credentials, kiloclaw.getStreamChatCredentials, kiloclaw.sendChatMessage, organizations.kiloclaw.getStreamChatCredentials, and organizations.kiloclaw.sendChatMessage.
  • /api/kilo-chat/token and kiloChat.getToken now return { token, expiresAt, userId }.

Mobile app

  • Replaces the Stream Chat mobile chat UI with the Kilo Chat client, hooks, and event-service provider stack.
  • Adds sandbox-scoped chat navigation with a conversation list and per-conversation detail screens.
  • Adds send, reply, edit, delete, reactions, action approval buttons, typing indicators, pending/delivery-failure states, and mark-read handling.
  • Wires realtime app, instance, and conversation presence plus event subscriptions for bot status, conversation lists, and message cache updates.
  • Moves unread counts to the notifications badge-bucket model, including per-instance Home badges, badge hydration, and foreground invalidation.
  • Updates notification handling to deep-link chat message pushes to the exact conversation and suppress foreground alerts only for the active conversation.

Compatibility notes:

  • Chat routing changes from /(app)/chat/[instance-id] to /(app)/chat/[sandbox-id] and /(app)/chat/[sandbox-id]/[conversation-id]. Existing one-segment chat links now land on the conversation list.
  • Push payloads change from legacy instanceId-based chat/lifecycle shapes to shared notifications payloads: chat messages require type: "chat.message", sandboxId, conversationId, and messageId; lifecycle notifications use sandboxId.
  • Mobile config now requires KILO_CHAT_URL, EVENT_SERVICE_URL, and NOTIFICATIONS_URL.

Shared packages and data model

  • Adds @kilocode/kilo-chat-hooks, a shared React Query hook layer for conversations, messages, bot status, optimistic mutations, event-driven cache updates, mark-read retry state, and pending action state.
  • Expands @kilocode/kilo-chat schemas/types/client contracts for reply snapshots, action resolution state, mark-read results, reaction removal results, execute-action results, canonical create responses, and paginated message pages.
  • Tightens Kilo Chat event payload validation with ULIDs/non-negative timestamps and richer payloads, including replyTo, reaction operationId, and full conversation.created data.
  • Updates @kilocode/event-service for connection tickets, ref-counted context subscriptions, KiloClaw event/presence context builders, and retry/unauthorized recovery.
  • Adds @kilocode/notifications with shared badge bucket helpers, push notification data schemas, and notification RPC schemas.
  • Adds verifyKiloBearerAgainstCurrentPepper in @kilocode/worker-utils/kilo-token-auth to validate token environment and current user pepper.
  • Removes channel_badge_counts and generated Drizzle metadata, plus remaining Stream Chat package/patch lockfile wiring.

Compatibility notes:

  • channel_badge_counts and ChannelBadgeCount are removed; unread/badge state moves to badge bucket contracts such as kiloclaw:<sandboxId> and kiloclaw:<sandboxId>:<conversationId>.
  • markConversationRead(conversationId) now requires { lastSeenMessageId } and returns { ok, applied, lastReadAt, badgeClear }; removeReaction returns { removed, id }; executeAction returns updated message content/resolution.
  • Create-message responses include message, create-conversation responses include conversation, message events include replyTo, conversation-created events include conversation, and reaction events include operationId.
  • Write schemas distinguish input action blocks from stored action blocks; create/edit payloads must not send resolved action state. Bot message list responses now include hasMore and nextCursor.

Kilo Chat service

  • Tightens human HTTP auth to environment-bound JWTs and validates the token pepper against the current user record.
  • Enriches create/list responses with full conversation/message objects, message pagination cursors, and reply snapshots.
  • Reworks read state around explicit mark-read calls, monotonic lastReadAt, and notification badge clearing.
  • Adds push notification fanout for non-sender human members through the NOTIFICATIONS service binding.
  • Expands realtime events with typed Event Service payloads, reaction operation IDs, targeted conversation.read, and replyTo data on message.created.
  • Serializes bot webhook delivery through ConversationDO, skips textless message webhooks, and reports/reverts failed message/action delivery.

Backwards compatibility:

  • Kilo JWTs must match WORKER_ENV and carry the current user pepper; they do not need a kilo-chat-specific token source.
  • POST /v1/conversations/:id/mark-read now requires { lastSeenMessageId } and returns { ok, applied, lastReadAt, badgeClear }; it can return 503 if badge clearing fails.
  • Leave, message delete, and reaction delete now return 200 JSON instead of 204; reaction delete returns { removed, id }.
  • Bot-status dedupe responses no longer include the previous dedupe: "fresh" field.
  • Create conversation/message, list messages, execute-action, and reaction events add fields; strict response/event schema clients need updates.
  • Inactive members can no longer keep editing/deleting/reacting/typing, and the last human leaving also deactivates bot membership.
  • Recipients are no longer marked read just because WebSocket delivery succeeded, human sends no longer auto-emit typing.stop, and conversation.read is targeted to the relevant user.
  • Deployment now requires WORKER_ENV, NOTIFICATIONS, and working Hyperdrive access for auth, ownership, and sandbox labels.

Event service

  • Replaces JWT-in-WebSocket-subprotocol auth with a two-step ticket flow: POST /connect-ticket accepts a bearer Kilo JWT, then GET /connect?ticket=... consumes the ticket for the WebSocket upgrade.
  • Adds ConnectionTicketDO with 30-second TTL, consume-once semantics, and alarm cleanup.
  • Tightens token validation to Kilo JWTs for the worker environment and checks the user pepper through Hyperdrive before minting tickets.
  • Switches the accepted WebSocket subprotocol to kilo.events.v1.
  • Adds browser CORS handling for /connect-ticket, including Authorization, and allows http://localhost:3000 for local development.
  • Adds presence introspection via isUserInContext(userId, context), backed by UserSessionDO.hasContext, so notification fanout can suppress pushes for live subscribers.
  • Adds config/bindings for CONNECTION_TICKET_DO, HYPERDRIVE, and WORKER_ENV.

Backwards compatibility:

  • Clients using Sec-WebSocket-Protocol: kilo.jwt.<base64url-jwt> must migrate to /connect-ticket, /connect?ticket=..., and kilo.events.v1.
  • The wire envelope and subscribe/unsubscribe messages remain compatible, but Kilo Chat payload schemas are stricter and some event payloads gained fields.
  • Deploys must include CONNECTION_TICKET_DO, HYPERDRIVE, WORKER_ENV, and the new Durable Object migration before the ticket flow works.
  • /health remains public. /connect still requires WebSocket upgrade but now rejects missing/invalid tickets instead of missing/invalid JWT subprotocols. /connect-ticket is public but bearer-authenticated.

Notifications service

  • Replaces Stream Chat webhook ingestion with Worker RPC push dispatch for Kilo Chat conversations.
  • Adds JWT-authenticated /v1/badges and /v1/badges/mark-read routes for mobile badge hydration and read clearing.
  • Moves unread badge storage from Postgres channel_badge_counts into per-user NotificationChannelDO bucket storage.
  • Adds per-recipient push dispatch with sender exclusion, recipient dedupe, idempotency keys, and Event Service presence suppression.
  • Updates Expo push handling to retry transient sends, classify ticket errors, delete stale tokens, and enqueue receipt checks only for accepted tickets.
  • Extends lifecycle push RPC results with ticket-error counts and changes lifecycle push data to carry sandboxId.
  • Changes service bindings/secrets by removing Stream Chat secret usage and adding NEXTAUTH_SECRET, WORKER_ENV, and EVENT_SERVICE.

Backwards compatibility:

  • /webhooks/stream-chat is removed; existing Stream Chat webhook delivery must be disabled or migrated to the new RPC producers.
  • /v1/* badge routes require Authorization: Bearer <Kilo JWT> and return { buckets } or { badgeCount }; no unauthenticated badge access is supported.
  • sendInstanceLifecycleNotification no longer uses instanceId in request/push data and now returns ticketErrors.
  • Chat push data changes to type: "chat.message" with sandboxId, conversationId, and messageId; lifecycle data uses sandboxId.
  • Old DO keys for webhook dedup/pending messages are ignored, and DB-backed badge counts are not backfilled into DO badge buckets.
  • Receipt queue messages remain compatible as { ticketTokenPairs }.
  • Deployments must provide NEXTAUTH_SECRET, WORKER_ENV, EVENT_SERVICE, EXPO_ACCESS_TOKEN, HYPERDRIVE, RECEIPTS_QUEUE, and NOTIFICATION_CHANNEL_DO; STREAM_CHAT_API_SECRET is no longer required.

KiloClaw service and runtime

  • Removes legacy Stream Chat runtime support: Docker install, worker env/types, persisted state, DO provisioning/backfill/deactivation, credential RPC, and /stream-chat-* HTTP routes.
  • Sanitizes existing openclaw.json before openclaw doctor, deleting channels.streamchat, the legacy plugin path, allow entry, and plugin entry.
  • Enables Kilo Chat as the controller channel with channels["kilo-chat"], /usr/local/lib/node_modules/@kiloclaw/kilo-chat, and kilo-chat in an existing plugin allowlist.
  • Tightens the Kilo Chat plugin client to shared schemas, removes additionalMembers from bot-created conversations, returns pagination from read actions, and avoids sending resolved approval action blocks through edit payloads.
  • Routes Kilo Chat webhook RPC by targetBotId=bot:kiloclaw:<sandboxId>, strips targetBotId, and forwards the webhook body to /plugins/kilo-chat/webhook.
  • Adds image content-mode tracking for dev/prod controller images, including image-content-hash.sh, FLY_IMAGE_CONTENT_MODE, local OpenClaw tarball mode, and CI/dev handoff checks.
  • Lifecycle push notifications now use shared @kilocode/notifications types, send sandboxId without instanceId, and log ticket-error counts as warnings.

Backwards compatibility:

  • Stream Chat clients and config are removed. Callers of /api/kiloclaw/chat-credentials, /api/platform/stream-chat-credentials, /api/platform/send-chat-message, or stored Stream Chat credentials must migrate to Kilo Chat.
  • Stale Stream Chat config is removed automatically before doctor; Kilo Chat is added without creating plugins.allow when it was absent, preserving permissive plugin loading.
  • Existing plugins.allow configs must include kilo-chat; this code appends it when the allowlist already exists.
  • Missing FLY_IMAGE_CONTENT_MODE defaults to production. Local-image dev flows need exactly one openclaw-build/openclaw-*.tgz or an explicit production/local mode.
  • Notification RPC is incompatible with an old notifications binding that still requires instanceId; it is compatible with the updated shared binding because sandboxId is now the lifecycle identifier. Missing NOTIFICATIONS still no-ops.
  • Runtime config shifts from Stream Chat secrets to Kilo Chat. Controller Kilo Chat routes require KILOCLAW_SANDBOX_ID and KILOCHAT_BASE_URL; worker KILO_CHAT cleanup and NOTIFICATIONS remain optional/failure-tolerant.
  • Kilo Chat webhook delivery remains compatible for routing: targetBotId is used only for routing, then stripped; plugin-side validation still accepts historical message.created payloads without a type field.
  • Kilo Chat create-message/create-conversation/list-message paths now expect canonical responses (message, conversation, hasMore, nextCursor) and no longer allow bot-created additionalMembers.

Dev, deployment, and local tooling

  • Updates KiloClaw deployment workflow and image-hash scripts for local-vs-production OpenClaw image handoff.
  • Updates local env sync to resolve suffixed secret sources and skip exec during secret discovery.
  • Removes Stream Chat dependency patches and root package references.

Verification

  • Not run as part of this PR-description-only update.

Visual Changes

  • UI changes are included in the web and mobile Kilo Chat cutover, but screenshots were not added in this PR-description-only update.

Reviewer Notes

  • Reviewers should focus on coordinated rollout and backwards compatibility across service contracts. Old web/mobile/Stream Chat clients are not expected to remain compatible with the new service stack.
  • Deploy Kilo Chat, Event Service, Notifications, KiloClaw, web, and mobile changes as a coordinated cutover because auth, event, notification, and badge contracts are coupled.

@iscekic iscekic self-assigned this Apr 29, 2026
Comment thread services/event-service/src/__tests__/has-context.test.ts
Comment thread services/event-service/src/__tests__/is-user-in-context.test.ts
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 29, 2026

Code Review Summary

Status: 7 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 7
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
apps/mobile/src/components/kilo-chat/conversation-row.tsx 118 Conversation rows wrap millisecond backend timestamps in new Date(...), depending on JS date parsing in mobile UI.
apps/mobile/src/components/kilo-chat/hooks/use-mark-read.ts 72 markRead depends on the unstable mutation object and can repeatedly fire focus-side effects.
services/kiloclaw/plugins/kilo-chat/src/channel.ts 241 Kilo Chat message tool hints still advertise target, but the schema no longer exposes it.
services/kiloclaw/src/index.ts 1082 deliverChatWebhook still does not validate runtime RPC payloads; satisfies is compile-time only.
services/notifications/src/index.ts 97 sendPushForConversationCore still does not validate runtime RPC input before deriving recipients and dispatching to DOs.
services/notifications/src/index.ts 174 clearBadgeBucketForUser still does not validate runtime input before selecting the user DO and clearing a bucket.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
services/notifications/src/lib/instance-lifecycle-push.ts 83 Instance lifecycle push dispatch still lacks runtime schema validation, but this line was not in the incremental diff.
Files Reviewed (20 files)
  • apps/mobile/src/app/(app)/(tabs)/(1_kiloclaw)/chat/[sandbox-id]/[conversation-id].tsx - 0 new issues
  • apps/mobile/src/components/agents/markdown-palette.test.ts - 0 new issues
  • apps/mobile/src/components/agents/markdown-palette.ts - 0 new issues
  • apps/mobile/src/components/agents/markdown-text.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/conversation-screen.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-actions.test.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-actions.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-bubble.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-gesture-state.test.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-gesture-state.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-input.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-list.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-markdown.tsx - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-presentation.test.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-presentation.ts - 0 new issues
  • apps/mobile/src/components/kilo-chat/message-reaction-pills.tsx - 0 new issues
  • packages/kilo-chat/src/schemas.ts - 0 new issues
  • services/kilo-chat/src/__tests__/conversations-routes.test.ts - 0 new issues
  • services/kilo-chat/src/routes/conversations.ts - 0 new issues
  • services/kiloclaw/plugins/kilo-chat/src/synced/schemas.ts - 0 new issues

Fix these issues in Kilo Cloud


Reviewed by gpt-5.5-20260423 · 6,103,680 tokens

@iscekic iscekic changed the title feat(notifications): PR 1 — shared schemas + presence query feat(kilo-chat): rip out stream chat Apr 30, 2026
Comment thread apps/mobile/src/components/kilo-chat/conversation-screen.tsx
Comment thread apps/mobile/src/components/kilo-chat/conversation-screen.tsx Outdated
Comment thread apps/mobile/src/components/kilo-chat/hooks/use-current-user-id.ts Outdated
Comment thread apps/mobile/src/components/kilo-chat/hooks/use-mark-read.ts
Comment thread apps/web/src/app/(app)/claw/components/ChatTab.tsx Outdated
Comment thread packages/db/src/migrations/0108_drop_badge_counts.sql Outdated
Comment thread services/notifications/src/dos/NotificationChannelDO.ts Outdated
Comment thread apps/mobile/src/components/kilo-chat/hooks/use-current-user-id.ts
Comment thread apps/mobile/src/components/kilo-chat/conversation-row.tsx
// member could approve exec actions running on the conversation-owner's Fly
// machine. Gate on session ownership before enabling multi-party.
// boundary, and KiloClaw currently keeps bot-created approval conversations
// owner-only by not forwarding additionalMembers.
authorizeActorAction: () => ({ authorized: true }),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current invariant is accurate today, but this plugin's authorizeActorAction is permissive precisely because the safety property is enforced elsewhere: the human createConversationRequestSchema has no additionalMembers field, createBotConversationFor rejects any additionalMembers, and this plugin no longer forwards them either. If any of those three relax later, any conversation member could approve exec commands running on the conversation owner's Fly machine.

The previous comment captured that hazard for future readers. Suggest adding a sentence alongside the current text along these lines:

If kilo-chat ever supports more than the owner plus the bot in a conversation, gate this on session ownership before relaxing the createConversationRequestSchema or createBotConversationFor constraints.

@@ -193,6 +187,8 @@ const nativeRuntime: ChannelApprovalNativeRuntimeAdapter<
},

updateEntry: async ({ entry, payload }) => {
if (hasResolvedActionsBlock(payload)) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment here explaining why this returns early: inputActionsBlockSchema declares resolved: z.never().optional(), so any edit or create with a resolved block would return 400 at the kilo-chat HTTP boundary. Skipping the call is mandatory once that schema landed; the transition into the resolved state is owned by /v1/conversations/:id/execute-action.

Without that comment, the skip looks like a defensive guard. It is actually a hard requirement coupled to a sibling schema, and hasResolvedActionsBlock is doing shape inspection on a payload built two functions away. If buildResolvedBlocks ever changes to omit the actions block (the way buildExpiredBlocks already does), the skip would silently stop firing and this code would start producing 400s.

Cleaner alternative if you want to drop the implicit coupling: have buildResolvedResult return an action that the SDK treats as no-op so transport never sees a resolved payload to inspect. If that surface does not exist on the SDK, the inline comment is enough.

}
const sandboxId = parsed.targetBotId.slice(botPrefix.length);
const sandboxId = targetBotId.slice(botPrefix.length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With chatWebhookRpcSchema.parse(payload) removed, the one downstream identifier that the prior schema validated is no longer checked. sandboxId derived from targetBotId.slice('bot:kiloclaw:'.length) then flows into idFromName(...), the registry lookup, the Postgres fallback, and gateway token derivation. The shared sandboxIdSchema already encodes the rule (/^[A-Za-z0-9_-]{1,64}$/).

Suggest one guard immediately after the prefix strip, so the property the parse used to provide for free is preserved without bringing back full payload parsing:

if (!sandboxIdSchema.safeParse(sandboxId).success) {
  throw new Error(`Invalid sandboxId derived from targetBotId: ${targetBotId}`);
}

Practical impact today is small (Cloudflare DOs accept any string, downstream reads would mostly fail safely), but an empty sandboxId would still take the legacy path and resolve to whatever userIdFromSandboxId('') returns. That state produces confusing labels in registry lookups during incidents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants