Skip to content

Feature/load testing#32

Open
yodem wants to merge 16 commits intomainfrom
feature/load-testing
Open

Feature/load testing#32
yodem wants to merge 16 commits intomainfrom
feature/load-testing

Conversation

@yodem
Copy link
Contributor

@yodem yodem commented Feb 15, 2026

Summary

Adds load testing for the chatbot server without using the real Anthropic API. A mock Anthropic server replaces the real API so you can measure capacity (threads, memory, DB, streaming) at no cost. Includes Docker Compose setup and Braintrust tracing changes for load-test runs.

What’s new

Load testing stack

  • Mock Anthropic API (server/loadtest/mock_anthropic.py) — FastAPI server that mimics the Anthropic Messages API (SSE streaming, tool-calling). Uses the same event types and payloads as the real API so the Claude Agent SDK runs end-to-end.
  • Load test script (server/loadtest/load_test.py) — Async httpx script that sends concurrent requests to /api/v2/chat/stream and reports TTFB, total response time, error rate, and throughput.
  • Tests (server/loadtest/test_mock_anthropic.py) — Pytest tests for the mock’s SSE format and tool-calling behavior.

Docker Compose

  • mock-anthropic — Runs the mock Anthropic API on port 8002.
  • loadtest — Runs the load test against the app (default: 50 requests, 10 concurrent).
  • app — Uses IS_LOAD_TESTING=true and MOCK_ANTHROPIC_URL to route to the mock instead of the real API.

Agent service factory

  • get_agent_service(is_load_testing) — Chooses mock vs real Anthropic based on IS_LOAD_TESTING.
  • ClaudeAgentService — Accepts optional base_url for mock routing.
  • IS_LOAD_TESTING + MOCK_ANTHROPIC_URL — Replaces ANTHROPIC_BASE_URL for load-test configuration.

Braintrust tracing

  • BRAINTRUST_LOGGING_ENABLED — Can be set to false to disable tracing during load tests.
  • No-op behavior — When disabled, Braintrust calls are no-ops instead of guarded with if bt_span:.
  • flush_braintrust() — Always calls braintrust.flush() (no-op when logging is disabled).

How to run

docker compose up -d --build
docker compose exec app python manage.py migrate
docker compose run --rm loadtest

Files changed

Area Files
Load test server/loadtest/ — mock, load script, tests, Dockerfiles, README
Docker docker-compose.yml — mock-anthropic, loadtest, app env vars
Agent claude_service.py, views.py, anthropic_views.py, utils.py
Braintrust claude_service.py, utils.py, BRAINTRUST_TRACING.md
Docs docs/ARCHITECTURE.md, docs/plans/braintrust-factory-refactor.md

yodem and others added 10 commits February 15, 2026 14:03
Adds a BRAINTRUST_ENABLED environment variable (default: true) that
gates all Braintrust tracing, logging, and prompt fetching. When set
to false, the system skips SDK setup, span logging, TracedThreadPool,
flush calls, and remote prompt fetching — reducing noise and overhead
during load tests.

Fixes SC-41751

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename 'inner' to 'message_task' for clarity in send_message()
- Move braintrust import to top of prompt_service.py

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Introduced a new mock Anthropic API server for load testing, allowing simulation of API calls without incurring costs.
- Updated `docker-compose.yml` to include the mock server and its health checks.
- Added load testing scripts and configuration files to facilitate performance testing of the chatbot server.
- Created documentation for usage and configuration of the load testing setup.
- Implemented tests for the mock server to ensure correct behavior and response formats.

This enhances the testing framework and allows for more efficient performance evaluations.
…re/sc-41751/implement-environment-variable-to-skip
…iles

- Renamed environment variable from `BRAINTRUST_ENABLED` to `BRAINTRUST_LOGGING_ENABLED` for clarity in tracing control.
- Updated related documentation and code references to reflect the new variable name.
- Added Kubernetes deployment files for the application, mock Anthropic API, and PostgreSQL, including necessary configurations and health checks.
- Introduced a secrets example file for managing sensitive information.

This enhances the deployment process and improves the clarity of Braintrust logging settings.
@yodem yodem changed the base branch from chore/sc-41751/implement-environment-variable-to-skip to main February 16, 2026 10:14
- Removed obsolete files from the .claude directory, including configuration and context files.
- Updated .gitignore to exclude new directories and files related to Claude's configuration and secrets management.

This cleanup improves project organization and ensures sensitive files are not tracked.
@yodem yodem requested a review from akiva10b February 16, 2026 10:19
- Introduced a new settings.json file in the .claude directory to manage plugin permissions and configurations.
- Updated .gitignore to ensure settings.json is not ignored, allowing it to be tracked.

This addition enhances the configuration management for Claude plugins.
Copy link
Contributor

@akiva10b akiva10b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc change: you can use the function get_agent_service which does a new init on every claud agent sdk to determine which api endpoint we want to use. In this way, you dont need to run another server but rather can just stipulate in the request how you want the agent to act

yodem and others added 4 commits February 17, 2026 15:37
…tern

- Remove scattered Braintrust if-else guards; rely on SDK no-op semantics
  when BRAINTRUST_LOGGING_ENABLED=false (current_span() returns noop span)
- Add get_agent_service(is_load_testing) factory: routes to mock Anthropic
  server when true, real API when false
- Add _IS_LOAD_TESTING module-level constant in views.py and anthropic_views.py
- Replace ANTHROPIC_BASE_URL env var with IS_LOAD_TESTING + MOCK_ANTHROPIC_URL
  in docker-compose.yml
- Delete k8s/ directory; update README and docs to remove k8s references
- Update BRAINTRUST_TRACING.md to reflect actual no-op behavior

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Renamed `braintrust_enabled` to `braintrust_logging_enabled` across multiple files to improve clarity in tracing control.
- Removed conditional guards related to `braintrust_logging_enabled` in the `send_message` and `_send_message_inner` methods, simplifying the logic.
- Updated documentation to reflect the changes in variable naming and behavior.

This refactor enhances the maintainability of the code and aligns with the updated logging configuration.
…actory pattern

- Removed scattered if-else guards related to Braintrust logging, simplifying the logic by relying on SDK no-op semantics.
- Introduced a factory pattern for `get_agent_service(is_load_testing)` to switch between mock and real Anthropic endpoints without environment variable changes.
- Updated `ClaudeAgentService` to accept a `base_url` parameter, allowing for flexible endpoint configuration.
- Simplified logging and tracing logic across multiple files, enhancing maintainability and clarity.

This refactor improves the overall structure and usability of the logging and agent service components.
Replace IS_LOAD_TESTING process-level env var with a per-request boolean.
Load test script passes isLoadTest:true in the request body; views.py uses
that flag to call get_agent_service(is_load_testing=...). Removes IS_LOAD_TESTING
from docker-compose.yml and all docs.

Also:
- Fix AES-GCM nonce to use os.urandom(12) instead of fixed all-zero bytes
- Add tests for isLoadTest serializer field, get_agent_service factory,
  BraintrustConfig.enabled, and _setup_braintrust_tracing early-return

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@yodem yodem requested a review from akiva10b February 18, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants