Skip to content

Claude (SDK) chat + tool-calling, budgeting, and docs/test updates#1

Merged
akhilsinghcodes merged 7 commits into
mainfrom
feature/claude_enhancement
Jun 1, 2026
Merged

Claude (SDK) chat + tool-calling, budgeting, and docs/test updates#1
akhilsinghcodes merged 7 commits into
mainfrom
feature/claude_enhancement

Conversation

@akhilsinghcodes
Copy link
Copy Markdown
Owner

Summary
This PR significantly expands Agents Fleet beyond PTY/CLI session monitoring by adding a full Claude (SDK) chat experience powered by the Anthropic SDK, including agentic tool-calling to run repo commands with Approve/Reject gating, improved budgeting (including tool-loop enforcement), and documentation + tests to reflect the new capabilities.


Key Features Added

Claude (SDK) Chat Sessions

  • Added a chat-first Claude SDK UI (React) with:
    • per-session transcript rendering
    • session id display for copy/reference within Agents Fleet
    • “New chat” flow that resets the draft + creates a fresh session
  • Persisted Claude SDK artifacts to SQLite:
    • config (claude_sdk_config_v1)
    • user messages (claude_sdk_user_message_v1)
    • assistant messages (claude_sdk_assistant_message_v1)
    • usage snapshots (claude_sdk_usage_v1)
    • tool approvals (claude_sdk_tool_approval_v1)
    • tool results (claude_sdk_tool_result_v1)

WebSocket Streaming (Chat + Tools)

  • Extended /ws beyond PTY streaming to support Claude SDK:
    • assistant streaming events (claude_sdk_chunk, claude_sdk_done)
    • tool request + output events (claude_sdk_tool_request, claude_sdk_tool_output)
    • client tool decisions (claude_sdk_tool_decision) to approve/reject commands

Tool-Calling: run_command (Any Shell Command)

  • Implemented Anthropic tool-calling in the Claude SDK turn runner:
    • tool: run_command({ command })
    • executes commands in the session repo working directory
  • Added explicit user gating:
    • UI shows each tool request inline with Approve / Reject
    • server blocks execution until a decision is received
  • Added command output limits:
    • tool output capped to 100KB to protect context and budgets

Budgeting Improvements

  • Budget enforcement for Claude SDK sessions:
    • preflight enforcement on send
    • enforcement during tool loops, not just at turn start
  • Model-aware USD estimation:
    • introduced computeModelCostUsd with a best-effort model pricing table
    • Claude SDK cost calculations use model pricing + SDK usage when available
  • Fixed a critical bug:
    • HTTP fallback route previously called processManager.stopSession(...) for Claude SDK sessions (no-op)
    • now directly updates the sessions row with stop_reason='budget_exceeded', timestamps, and status

Usage Visibility

  • UI fetches latest claude_sdk_usage_v1 artifact after a turn and displays:
    • input tokens
    • output tokens
    • thinking tokens (if present)
    • cache read/write tokens (if present)

Tests Added

  • Added server test verifying Claude SDK USD budget enforcement:
    • apps/server/test/claude_sdk_budget.test.ts
    • uses a mocked Anthropic SDK (no network)

Docs Updated

  • ROADMAP.md
    • marked Claude SDK chat/tooling/budgeting items as done
    • added next steps: pricing configurability + further budget accuracy/testing
  • README.md
    • updated WebSocket description to include SDK events
    • added Claude SDK section + screenshots:
      • budget stop
      • tool call + output
      • tool permission gate
  • ARCHITECTURE.md
    • updated system diagram and narrative to include Claude SDK path
    • documented artifact kinds for Claude SDK sessions

Notes / Follow-ups

  • Model pricing table is best-effort and should be made configurable (env/JSON) for accuracy across accounts/contracts.
  • WS tool-loop budget enforcement relies on estimated/model-based cost; accuracy improves when SDK usage is present.
  • Additional WS-level integration tests (tool approvals + mid-loop budget cutoff) can be added next.

Validation

  • pnpm -r typecheck
  • pnpm -r build
  • pnpm -r -F @agents_fleet/server test

@akhilsinghcodes akhilsinghcodes merged commit c82b42e into main Jun 1, 2026
1 check passed
@akhilsinghcodes akhilsinghcodes deleted the feature/claude_enhancement branch June 1, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant