Feature/load testing by yodem · Pull Request #32 · Sefaria/ai-chatbot

yodem · 2026-02-15T15:06:09Z

Summary

Adds load testing for the chatbot server without using the real Anthropic API. A mock Anthropic server replaces the real API so you can measure capacity (threads, memory, DB, streaming) at no cost. Includes Docker Compose setup and Braintrust tracing changes for load-test runs.

What’s new

Load testing stack

Mock Anthropic API (server/loadtest/mock_anthropic.py) — FastAPI server that mimics the Anthropic Messages API (SSE streaming, tool-calling). Uses the same event types and payloads as the real API so the Claude Agent SDK runs end-to-end.
Load test script (server/loadtest/load_test.py) — Async httpx script that sends concurrent requests to /api/v2/chat/stream and reports TTFB, total response time, error rate, and throughput.
Tests (server/loadtest/test_mock_anthropic.py) — Pytest tests for the mock’s SSE format and tool-calling behavior.

Docker Compose

mock-anthropic — Runs the mock Anthropic API on port 8002.
loadtest — Runs the load test against the app (default: 50 requests, 10 concurrent).
app — Uses IS_LOAD_TESTING=true and MOCK_ANTHROPIC_URL to route to the mock instead of the real API.

Agent service factory

get_agent_service(is_load_testing) — Chooses mock vs real Anthropic based on IS_LOAD_TESTING.
ClaudeAgentService — Accepts optional base_url for mock routing.
IS_LOAD_TESTING + MOCK_ANTHROPIC_URL — Replaces ANTHROPIC_BASE_URL for load-test configuration.

Braintrust tracing

BRAINTRUST_LOGGING_ENABLED — Can be set to false to disable tracing during load tests.
No-op behavior — When disabled, Braintrust calls are no-ops instead of guarded with if bt_span:.
flush_braintrust() — Always calls braintrust.flush() (no-op when logging is disabled).

How to run

docker compose up -d --build
docker compose exec app python manage.py migrate
docker compose run --rm loadtest

Files changed

Area	Files
Load test	`server/loadtest/` — mock, load script, tests, Dockerfiles, README
Docker	`docker-compose.yml` — mock-anthropic, loadtest, app env vars
Agent	`claude_service.py`, `views.py`, `anthropic_views.py`, `utils.py`
Braintrust	`claude_service.py`, `utils.py`, `BRAINTRUST_TRACING.md`
Docs	`docs/ARCHITECTURE.md`, `docs/plans/braintrust-factory-refactor.md`

Adds a BRAINTRUST_ENABLED environment variable (default: true) that gates all Braintrust tracing, logging, and prompt fetching. When set to false, the system skips SDK setup, span logging, TracedThreadPool, flush calls, and remote prompt fetching — reducing noise and overhead during load tests. Fixes SC-41751 Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

- Rename 'inner' to 'message_task' for clarity in send_message() - Move braintrust import to top of prompt_service.py Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

… Braintrust

- Introduced a new mock Anthropic API server for load testing, allowing simulation of API calls without incurring costs. - Updated `docker-compose.yml` to include the mock server and its health checks. - Added load testing scripts and configuration files to facilitate performance testing of the chatbot server. - Created documentation for usage and configuration of the load testing setup. - Implemented tests for the mock server to ensure correct behavior and response formats. This enhances the testing framework and allows for more efficient performance evaluations.

…re/sc-41751/implement-environment-variable-to-skip

https://github.com/Sefaria/ai-chatbot into feature/load-testing

…iles - Renamed environment variable from `BRAINTRUST_ENABLED` to `BRAINTRUST_LOGGING_ENABLED` for clarity in tracing control. - Updated related documentation and code references to reflect the new variable name. - Added Kubernetes deployment files for the application, mock Anthropic API, and PostgreSQL, including necessary configurations and health checks. - Introduced a secrets example file for managing sensitive information. This enhances the deployment process and improves the clarity of Braintrust logging settings.

- Removed obsolete files from the .claude directory, including configuration and context files. - Updated .gitignore to exclude new directories and files related to Claude's configuration and secrets management. This cleanup improves project organization and ensures sensitive files are not tracked.

- Introduced a new settings.json file in the .claude directory to manage plugin permissions and configurations. - Updated .gitignore to ensure settings.json is not ignored, allowing it to be tracked. This addition enhances the configuration management for Claude plugins.

server/chat/V2/agent/claude_service.py

akiva10b

Arc change: you can use the function get_agent_service which does a new init on every claud agent sdk to determine which api endpoint we want to use. In this way, you dont need to run another server but rather can just stipulate in the request how you want the agent to act

…tern - Remove scattered Braintrust if-else guards; rely on SDK no-op semantics when BRAINTRUST_LOGGING_ENABLED=false (current_span() returns noop span) - Add get_agent_service(is_load_testing) factory: routes to mock Anthropic server when true, real API when false - Add _IS_LOAD_TESTING module-level constant in views.py and anthropic_views.py - Replace ANTHROPIC_BASE_URL env var with IS_LOAD_TESTING + MOCK_ANTHROPIC_URL in docker-compose.yml - Delete k8s/ directory; update README and docs to remove k8s references - Update BRAINTRUST_TRACING.md to reflect actual no-op behavior Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Renamed `braintrust_enabled` to `braintrust_logging_enabled` across multiple files to improve clarity in tracing control. - Removed conditional guards related to `braintrust_logging_enabled` in the `send_message` and `_send_message_inner` methods, simplifying the logic. - Updated documentation to reflect the changes in variable naming and behavior. This refactor enhances the maintainability of the code and aligns with the updated logging configuration.

…actory pattern - Removed scattered if-else guards related to Braintrust logging, simplifying the logic by relying on SDK no-op semantics. - Introduced a factory pattern for `get_agent_service(is_load_testing)` to switch between mock and real Anthropic endpoints without environment variable changes. - Updated `ClaudeAgentService` to accept a `base_url` parameter, allowing for flexible endpoint configuration. - Simplified logging and tracing logic across multiple files, enhancing maintainability and clarity. This refactor improves the overall structure and usability of the logging and agent service components.

Replace IS_LOAD_TESTING process-level env var with a per-request boolean. Load test script passes isLoadTest:true in the request body; views.py uses that flag to call get_agent_service(is_load_testing=...). Removes IS_LOAD_TESTING from docker-compose.yml and all docs. Also: - Fix AES-GCM nonce to use os.urandom(12) instead of fixed all-zero bytes - Add tests for isLoadTest serializer field, get_agent_service factory, BraintrustConfig.enabled, and _setup_braintrust_tracing early-return Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

yodem and others added 10 commits February 15, 2026 14:03

refactor: add type hint for bt_span in guardrail helper

371d1b0

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor: improve naming and import organization

baff8f5

- Rename 'inner' to 'message_task' for clarity in send_message() - Move braintrust import to top of prompt_service.py Co-authored-by: Cursor <cursoragent@cursor.com>

chore: remove unused braintrust import from prompt_service.py

74fe53c

refactor: move braintrust import back to conditional import

97893d3

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge chore/sc-41751/implement-environment-variable-to-skip to ignore…

1bcde73

… Braintrust

Merge branch 'main' of https://github.com/Sefaria/ai-chatbot into cho…

c6d4baa

…re/sc-41751/implement-environment-variable-to-skip

Merge branch 'chore/sc-41751/implement-environment-variable-to-skip' of

45122be

https://github.com/Sefaria/ai-chatbot into feature/load-testing

yodem changed the base branch from chore/sc-41751/implement-environment-variable-to-skip to main February 16, 2026 10:14

yodem requested a review from akiva10b February 16, 2026 10:19

akiva10b reviewed Feb 17, 2026

View reviewed changes

server/chat/V2/agent/claude_service.py Show resolved Hide resolved

akiva10b reviewed Feb 17, 2026

View reviewed changes

yodem and others added 4 commits February 17, 2026 15:37

yodem requested a review from akiva10b February 18, 2026 10:46

akiva10b approved these changes Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/load testing#32

Feature/load testing#32
yodem wants to merge 16 commits intomainfrom
feature/load-testing

yodem commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

akiva10b left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yodem commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s new

Load testing stack

Docker Compose

Agent service factory

Braintrust tracing

How to run

Files changed

Uh oh!

Uh oh!

akiva10b left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yodem commented Feb 15, 2026 •

edited

Loading