samrusani · samrusani · Apr 11, 2026
diff --git a/BUILD_REPORT.md b/BUILD_REPORT.md
@@ -1,82 +1,69 @@
 # BUILD_REPORT
 
 ## sprint objective
-Implement Phase 11 Sprint 2 (`P11-S2`) local-provider support by shipping Ollama and llama.cpp adapters behind the existing provider abstraction, including registration, model enumeration + health posture snapshots, and normalized runtime invoke through existing `v1` seams.
+Implement `P11-S3` by adding a vLLM adapter and self-hosted runtime path through the existing provider abstraction, with bounded provider-specific passthrough options, normalized latency/usage telemetry persistence and API exposure, plus self-hosted docs and runnable examples.
 
 ## completed work
-- Added local provider transport helpers in `apps/api/src/alicebot_api/local_provider_helpers.py`:
-  - auth header handling (`bearer`/`none`)
-  - deterministic JSON request helper
-  - Ollama/llama.cpp model enumeration parsers
-  - Ollama/llama.cpp invoke response normalization
-- Extended provider runtime adapters in `apps/api/src/alicebot_api/provider_runtime.py`:
-  - added `ollama` and `llamacpp` adapter keys and implementations
-  - registered both adapters in the existing provider registry
-  - added deterministic capability snapshot fields for local health/model posture
-  - preserved normalized runtime provider seam (`openai_responses`)
-- Added additive model provider config fields in persistence:
-  - migration `apps/api/alembic/versions/20260411_0053_phase11_local_provider_config_fields.py`
-  - store/runtime wiring for `auth_mode`, `model_list_path`, `healthcheck_path`, `invoke_path`
-- Updated API contract and serialization surfaces:
-  - `apps/api/src/alicebot_api/contracts.py`
-  - `apps/api/src/alicebot_api/store.py`
-  - `apps/api/src/alicebot_api/main.py`
-- Added new registration APIs in `apps/api/src/alicebot_api/main.py`:
-  - `POST /v1/providers/ollama/register`
-  - `POST /v1/providers/llamacpp/register`
-- Kept existing in-scope APIs working with local adapters:
-  - `POST /v1/providers/test`
-  - `POST /v1/runtime/invoke`
-  - `GET /v1/providers`
-  - `GET /v1/providers/{provider_id}`
-- Added failure-safe capability behavior:
-  - registration stores failed discovery posture when local provider is unreachable
-  - provider test stores failed discovery posture when capability discovery fails
-- Added sprint verification tests:
-  - `tests/unit/test_provider_runtime.py`
-  - `tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py`
-  - `tests/integration/test_phase11_provider_runtime_api.py`
-- Added local setup docs and runnable example paths:
-  - `docs/integrations/phase11-local-provider-adapters.md`
-  - `scripts/run_phase11_local_provider_e2e.py`
-- Updated control-doc truth checker markers for current sprint state:
-  - `scripts/check_control_doc_truth.py`
-  - linked new integration doc from `README.md`
+- Added vLLM adapter support in provider runtime:
+  - new adapter key `vllm`
+  - capability discovery via `/health` + `/v1/models`
+  - invoke via `/v1/chat/completions`
+  - capability snapshot telemetry posture fields (`supports_normalized_latency_telemetry`, `supports_normalized_usage_telemetry`, `telemetry_flow_scope`)
+- Added bounded provider-specific passthrough:
+  - explicit `adapter_options.invoke_passthrough` schema for vLLM registration
+  - bounded allowlist extraction helper for vLLM passthrough options
+  - passthrough applied only in vLLM adapter invoke payload
+- Added vLLM provider registration endpoint:
+  - `POST /v1/providers/vllm/register`
+- Added provider telemetry persistence + API:
+  - new telemetry storage table and store methods
+  - telemetry recording for `/v1/providers/test` and `/v1/runtime/invoke`
+  - new endpoint `GET /v1/providers/{provider_id}/telemetry`
+- Added additive provider config field support:
+  - `model_providers.adapter_options` persisted and serialized
+- Added migration:
+  - `20260411_0054_phase11_vllm_telemetry`
+- Added/updated tests for runtime, integration, and migration coverage
+- Added self-hosted docs and runnable script for vLLM end-to-end flow
+- Updated control-doc truth check markers to `P11-S3`
 
 ## incomplete work
-- None for `P11-S2` acceptance criteria and required verification commands.
+- None identified within sprint scope.
 
 ## files changed
-- `apps/api/src/alicebot_api/local_provider_helpers.py`
+Sprint-owned files changed:
 - `apps/api/src/alicebot_api/provider_runtime.py`
 - `apps/api/src/alicebot_api/main.py`
-- `apps/api/src/alicebot_api/store.py`
 - `apps/api/src/alicebot_api/contracts.py`
-- `apps/api/alembic/versions/20260411_0053_phase11_local_provider_config_fields.py`
+- `apps/api/src/alicebot_api/store.py`
+- `apps/api/src/alicebot_api/vllm_provider_helpers.py` (new)
+- `apps/api/alembic/versions/20260411_0054_phase11_vllm_telemetry.py` (new)
 - `tests/unit/test_provider_runtime.py`
-- `tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py`
 - `tests/integration/test_phase11_provider_runtime_api.py`
-- `docs/integrations/phase11-local-provider-adapters.md`
-- `scripts/run_phase11_local_provider_e2e.py`
+- `tests/unit/test_20260411_0054_phase11_vllm_telemetry.py` (new)
+- `docs/integrations/phase11-vllm-self-hosted.md` (new)
+- `scripts/run_phase11_vllm_e2e.py` (new)
 - `scripts/check_control_doc_truth.py`
-- `README.md`
 - `BUILD_REPORT.md`
 - `REVIEW_REPORT.md`
 
+Pre-existing dirty files excluded from sprint merge scope:
+- `README.md`
+- `ARCHITECTURE.md`
+- `PRODUCT_BRIEF.md`
+
 ## tests run
+Required verification commands and exact results:
 - `python3 scripts/check_control_doc_truth.py`
   - Result: `PASS`
+  - Verified: `README.md`, `ROADMAP.md`, `.ai/active/SPRINT_PACKET.md`, `RULES.md`, `.ai/handoff/CURRENT_STATE.md`, `docs/archive/planning/2026-04-08-context-compaction/README.md`
 - `./.venv/bin/python -m pytest tests/unit tests/integration -q`
-  - Result: `PASS` (`1118 passed in 183.14s (0:03:03)`)
+  - Result: `1122 passed in 170.62s (0:02:50)`
 - `pnpm --dir apps/web test`
-  - Result: `PASS` (`62 files`, `199 tests`, duration `4.82s`)
-- Sprint-targeted subset:
-  - `./.venv/bin/python -m pytest tests/unit/test_provider_runtime.py tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py tests/integration/test_phase11_provider_runtime_api.py -q`
-  - Result: `PASS` (`12 passed in 2.50s`)
+  - Result: `62 passed` test files, `199 passed` tests, duration `4.86s`
 
 ## blockers/issues
-- No active implementation blockers.
+- No blockers during implementation.
 
 ## recommended next step
-1. Open a sprint PR from `codex/phase11-sprint-2-ollama-llamacpp-adapters` with this report and required test evidence.
-2. Keep pre-existing dirty local docs (`ARCHITECTURE.md`, `PRODUCT_BRIEF.md`) excluded from sprint merge scope.
+1. Open the sprint PR from branch `codex/phase11-sprint-3-vllm-adapter-selfhosted` and request review focused on vLLM telemetry schema and endpoint response shape stability.
diff --git a/REVIEW_REPORT.md b/REVIEW_REPORT.md
@@ -4,50 +4,51 @@
 PASS
 
 ## criteria met
-- `P11-S2` local provider registration APIs are implemented and functioning:
-  - `POST /v1/providers/ollama/register`
-  - `POST /v1/providers/llamacpp/register`
-- Existing in-scope APIs are functioning with local adapters:
+- `P11-S3` acceptance criteria are met for the vLLM self-hosted path.
+- vLLM registration is implemented through the shipped provider registry:
+  - `POST /v1/providers/vllm/register`
+- Provider tests and capability snapshots expose deterministic self-hosted posture through the existing abstraction:
   - `POST /v1/providers/test`
+  - capability snapshot fields include normalized telemetry posture (`supports_normalized_usage_telemetry`, `supports_normalized_latency_telemetry`, `telemetry_flow_scope`).
+- Runtime invoke works through the shipped normalized provider contract for vLLM:
   - `POST /v1/runtime/invoke`
-  - `GET /v1/providers`
-  - `GET /v1/providers/{provider_id}`
-- Ollama and llama.cpp adapters are integrated through the shipped provider abstraction and registry.
-- Capability snapshots include deterministic local model enumeration and health posture fields.
-- Additive provider config fields are migrated and wired (`auth_mode`, `model_list_path`, `healthcheck_path`, `invoke_path`).
-- Local setup documentation and runnable e2e example path are present.
-- Regression fix validated: legacy `/v1/providers` path now correctly passes `store` into shared registration helper (`apps/api/src/alicebot_api/main.py:6174-6177`).
-- Credential handling tightened: `auth_mode="none"` now rejects non-empty `api_key`, preventing plaintext persistence (`apps/api/src/alicebot_api/main.py:1562-1568`).
-- New regression coverage added:
-  - OpenAI-compatible registration still works and stores secret ref, not plaintext (`tests/integration/test_phase11_provider_runtime_api.py:470-491`).
-  - `auth_mode="none"` rejects provided `api_key` (`tests/integration/test_phase11_provider_runtime_api.py:494-514`).
-- Required verification commands pass on the current branch head:
-  - `python3 scripts/check_control_doc_truth.py` -> PASS
-  - `./.venv/bin/python -m pytest tests/unit tests/integration -q` -> PASS (`1118 passed in 183.14s`)
-  - `pnpm --dir apps/web test` -> PASS (`62 files`, `199 tests`, duration `4.82s`)
+- Normalized latency and usage telemetry are persisted and exposed:
+  - migration adds `provider_invocation_telemetry`
+  - telemetry writes for `provider_test` and `runtime_invoke`
+  - `GET /v1/providers/{provider_id}/telemetry`
+- Bounded provider-specific passthrough is implemented behind explicit vLLM adapter options (`adapter_options.invoke_passthrough` allowlist).
+- Self-hosted docs and runnable examples are now internally consistent for local split endpoints (API `:8000`, vLLM provider `:8001`):
+  - [phase11-vllm-self-hosted.md](/Users/samirusani/Desktop/Codex/AliceBot/docs/integrations/phase11-vllm-self-hosted.md)
+  - [run_phase11_vllm_e2e.py](/Users/samirusani/Desktop/Codex/AliceBot/scripts/run_phase11_vllm_e2e.py)
+- Existing `P11-S1` / `P11-S2` seams remain intact (verified by full unit+integration pass and existing integration coverage).
 
 ## criteria missed
-- None identified for `P11-S2` acceptance criteria.
+- None.
 
 ## quality issues
-- No blocking quality issues remain in sprint-owned scope after fixes.
+- No blocking quality issues found in sprint-owned changes after the endpoint-default fix.
+- Out-of-scope dirty local docs remain present and should stay excluded from sprint merge scope:
+  - `ARCHITECTURE.md`
+  - `PRODUCT_BRIEF.md`
+  - `README.md` (pre-existing dirty context in branch)
 
 ## regression risks
-- Low. Full required verification is passing, including new regression tests for the previously broken path.
-- Residual operational risk remains external local-provider availability (Ollama/llama.cpp process reachability), which is surfaced via explicit discovery/test failure posture.
+- Low.
+- Required verification suite passes on current workspace state:
+  - `python3 scripts/check_control_doc_truth.py` -> PASS
+  - `./.venv/bin/python -m pytest tests/unit tests/integration -q` -> `1122 passed in 170.62s`
+  - `pnpm --dir apps/web test` -> `62 passed` files, `199 passed` tests, duration `4.86s`
 
 ## docs issues
-- No local identifiers (local computer paths, names) were found in sprint-owned changed code/docs reviewed here.
-- Out-of-scope dirty local docs remain and should stay excluded from sprint merge scope:
-  - `ARCHITECTURE.md`
-  - `PRODUCT_BRIEF.md`
+- Fixed: vLLM self-hosted docs/script no longer default provider URL to the API URL.
+- No local identifiers (local machine paths, personal names, local-only identifiers) were found in reviewed sprint-owned files.
 
 ## should anything be added to RULES.md?
-- Optional improvement: require backward-compat regression tests for already-shipped endpoints whenever shared registration/runtime helpers are refactored.
+- Optional: add a guardrail that runnable docs/scripts must use non-conflicting default endpoints in multi-service flows and be smoke-validated before merge.
 
 ## should anything update ARCHITECTURE.md?
-- Optional improvement: add a concise note clarifying auth-mode credential invariants (`bearer` uses secret refs; `none` must not persist API keys).
+- No required architecture update for `P11-S3` merge.
 
 ## recommended next action
-1. Ready for Control Tower merge approval with the updated build and review evidence on this branch head.
-2. Keep `ARCHITECTURE.md` and `PRODUCT_BRIEF.md` excluded from the sprint PR.
+1. Proceed with sprint PR review/merge for `P11-S3`.
+2. Keep non-sprint control-doc rewrites excluded from this PR unless explicitly approved as separate scope.
diff --git a/apps/api/alembic/versions/20260411_0054_phase11_vllm_telemetry.py b/apps/api/alembic/versions/20260411_0054_phase11_vllm_telemetry.py
@@ -0,0 +1,90 @@
+"""Add vLLM adapter options and provider invocation telemetry."""
+
+from __future__ import annotations
+
+from alembic import op
+
+
+revision = "20260411_0054"
+down_revision = "20260411_0053"
+branch_labels = None
+depends_on = None
+
+_UPGRADE_STATEMENTS = (
+    "ALTER TABLE model_providers ADD COLUMN adapter_options jsonb NOT NULL DEFAULT '{}'::jsonb",
+    (
+        "ALTER TABLE model_providers "
+        "ADD CONSTRAINT model_providers_adapter_options_object_check "
+        "CHECK (jsonb_typeof(adapter_options) = 'object')"
+    ),
+    """
+        CREATE TABLE provider_invocation_telemetry (
+          id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
+          workspace_id uuid NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
+          provider_id uuid NOT NULL REFERENCES model_providers(id) ON DELETE CASCADE,
+          invoked_by_user_account_id uuid NOT NULL REFERENCES user_accounts(id) ON DELETE RESTRICT,
+          flow_kind text NOT NULL,
+          adapter_key text NOT NULL,
+          runtime_provider text NOT NULL,
+          provider_model text NOT NULL,
+          status text NOT NULL,
+          error_message text NULL,
+          latency_ms integer NOT NULL,
+          input_tokens integer NULL,
+          output_tokens integer NULL,
+          total_tokens integer NULL,
+          metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
+          created_at timestamptz NOT NULL DEFAULT now(),
+          CONSTRAINT provider_invocation_telemetry_flow_kind_check
+            CHECK (flow_kind IN ('provider_test', 'runtime_invoke')),
+          CONSTRAINT provider_invocation_telemetry_adapter_key_length_check
+            CHECK (char_length(adapter_key) >= 1 AND char_length(adapter_key) <= 80),
+          CONSTRAINT provider_invocation_telemetry_runtime_provider_length_check
+            CHECK (char_length(runtime_provider) >= 1 AND char_length(runtime_provider) <= 100),
+          CONSTRAINT provider_invocation_telemetry_provider_model_length_check
+            CHECK (char_length(provider_model) >= 1 AND char_length(provider_model) <= 200),
+          CONSTRAINT provider_invocation_telemetry_status_check
+            CHECK (status IN ('completed', 'failed')),
+          CONSTRAINT provider_invocation_telemetry_latency_non_negative_check
+            CHECK (latency_ms >= 0),
+          CONSTRAINT provider_invocation_telemetry_input_tokens_non_negative_check
+            CHECK (input_tokens IS NULL OR input_tokens >= 0),
+          CONSTRAINT provider_invocation_telemetry_output_tokens_non_negative_check
+            CHECK (output_tokens IS NULL OR output_tokens >= 0),
+          CONSTRAINT provider_invocation_telemetry_total_tokens_non_negative_check
+            CHECK (total_tokens IS NULL OR total_tokens >= 0)
+        )
+        """,
+    (
+        "CREATE INDEX provider_invocation_telemetry_provider_created_idx "
+        "ON provider_invocation_telemetry (provider_id, created_at DESC, id DESC)"
+    ),
+    (
+        "CREATE INDEX provider_invocation_telemetry_workspace_created_idx "
+        "ON provider_invocation_telemetry (workspace_id, created_at DESC, id DESC)"
+    ),
+)
+
+_UPGRADE_GRANT_STATEMENTS = (
+    "GRANT SELECT, INSERT, UPDATE, DELETE ON provider_invocation_telemetry TO alicebot_app",
+)
+
+_DOWNGRADE_STATEMENTS = (
+    "DROP TABLE IF EXISTS provider_invocation_telemetry",
+    "ALTER TABLE model_providers DROP CONSTRAINT IF EXISTS model_providers_adapter_options_object_check",
+    "ALTER TABLE model_providers DROP COLUMN IF EXISTS adapter_options",
+)
+
+
+def _execute_statements(statements: tuple[str, ...]) -> None:
+    for statement in statements:
+        op.execute(statement)
+
+
+def upgrade() -> None:
+    _execute_statements(_UPGRADE_STATEMENTS)
+    _execute_statements(_UPGRADE_GRANT_STATEMENTS)
+
+
+def downgrade() -> None:
+    _execute_statements(_DOWNGRADE_STATEMENTS)
diff --git a/apps/api/src/alicebot_api/contracts.py b/apps/api/src/alicebot_api/contracts.py
@@ -189,9 +189,11 @@
 ToolRoutingDecision = Literal["ready", "denied", "approval_required"]
 PromptSectionName = Literal["system", "developer", "context", "conversation"]
 ModelProvider = Literal["openai_responses"]
-ProviderAdapterKey = Literal["openai_compatible", "ollama", "llamacpp"]
+ProviderAdapterKey = Literal["openai_compatible", "ollama", "llamacpp", "vllm"]
 ModelProviderStatus = Literal["active"]
 ProviderCapabilityDiscoveryStatus = Literal["ready", "failed"]
+ProviderInvocationFlowKind = Literal["provider_test", "runtime_invoke"]
+ProviderInvocationStatus = Literal["completed", "failed"]
 ModelFinishReason = Literal["completed", "incomplete"]
 ExplicitPreferencePattern = Literal[
     "i_like",
@@ -1553,6 +1555,7 @@ class ModelProviderRecord(TypedDict):
     model_list_path: str
     healthcheck_path: str
     invoke_path: str
+    adapter_options: JsonObject
     metadata: JsonObject
     created_at: str
     updated_at: str
@@ -1611,6 +1614,39 @@ class RuntimeInvokeResponse(TypedDict):
     trace: ResponseTraceSummary
 
 
+class ProviderInvocationTelemetryRecord(TypedDict):
+    id: str
+    workspace_id: str
+    provider_id: str
+    invoked_by_user_account_id: str
+    flow_kind: ProviderInvocationFlowKind
+    adapter_key: ProviderAdapterKey
+    runtime_provider: ModelProvider
+    provider_model: str
+    status: ProviderInvocationStatus
+    error_message: str | None
+    latency_ms: int
+    usage: ModelUsagePayload
+    metadata: JsonObject
+    created_at: str
+
+
+class ProviderTelemetrySummary(TypedDict):
+    total_count: int
+    completed_count: int
+    failed_count: int
+    average_latency_ms: float
+    latest_created_at: str | None
+    usage_totals: ModelUsagePayload
+
+
+class ProviderTelemetryResponse(TypedDict):
+    provider_id: str
+    summary: ProviderTelemetrySummary
+    items: list[ProviderInvocationTelemetryRecord]
+    order: list[str]
+
+
 @dataclass(frozen=True, slots=True)
 class OpenLoopCandidateInput:
     title: str