Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 42 additions & 55 deletions BUILD_REPORT.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,69 @@
# BUILD_REPORT

## sprint objective
Implement Phase 11 Sprint 2 (`P11-S2`) local-provider support by shipping Ollama and llama.cpp adapters behind the existing provider abstraction, including registration, model enumeration + health posture snapshots, and normalized runtime invoke through existing `v1` seams.
Implement `P11-S3` by adding a vLLM adapter and self-hosted runtime path through the existing provider abstraction, with bounded provider-specific passthrough options, normalized latency/usage telemetry persistence and API exposure, plus self-hosted docs and runnable examples.

## completed work
- Added local provider transport helpers in `apps/api/src/alicebot_api/local_provider_helpers.py`:
- auth header handling (`bearer`/`none`)
- deterministic JSON request helper
- Ollama/llama.cpp model enumeration parsers
- Ollama/llama.cpp invoke response normalization
- Extended provider runtime adapters in `apps/api/src/alicebot_api/provider_runtime.py`:
- added `ollama` and `llamacpp` adapter keys and implementations
- registered both adapters in the existing provider registry
- added deterministic capability snapshot fields for local health/model posture
- preserved normalized runtime provider seam (`openai_responses`)
- Added additive model provider config fields in persistence:
- migration `apps/api/alembic/versions/20260411_0053_phase11_local_provider_config_fields.py`
- store/runtime wiring for `auth_mode`, `model_list_path`, `healthcheck_path`, `invoke_path`
- Updated API contract and serialization surfaces:
- `apps/api/src/alicebot_api/contracts.py`
- `apps/api/src/alicebot_api/store.py`
- `apps/api/src/alicebot_api/main.py`
- Added new registration APIs in `apps/api/src/alicebot_api/main.py`:
- `POST /v1/providers/ollama/register`
- `POST /v1/providers/llamacpp/register`
- Kept existing in-scope APIs working with local adapters:
- `POST /v1/providers/test`
- `POST /v1/runtime/invoke`
- `GET /v1/providers`
- `GET /v1/providers/{provider_id}`
- Added failure-safe capability behavior:
- registration stores failed discovery posture when local provider is unreachable
- provider test stores failed discovery posture when capability discovery fails
- Added sprint verification tests:
- `tests/unit/test_provider_runtime.py`
- `tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py`
- `tests/integration/test_phase11_provider_runtime_api.py`
- Added local setup docs and runnable example paths:
- `docs/integrations/phase11-local-provider-adapters.md`
- `scripts/run_phase11_local_provider_e2e.py`
- Updated control-doc truth checker markers for current sprint state:
- `scripts/check_control_doc_truth.py`
- linked new integration doc from `README.md`
- Added vLLM adapter support in provider runtime:
- new adapter key `vllm`
- capability discovery via `/health` + `/v1/models`
- invoke via `/v1/chat/completions`
- capability snapshot telemetry posture fields (`supports_normalized_latency_telemetry`, `supports_normalized_usage_telemetry`, `telemetry_flow_scope`)
- Added bounded provider-specific passthrough:
- explicit `adapter_options.invoke_passthrough` schema for vLLM registration
- bounded allowlist extraction helper for vLLM passthrough options
- passthrough applied only in vLLM adapter invoke payload
- Added vLLM provider registration endpoint:
- `POST /v1/providers/vllm/register`
- Added provider telemetry persistence + API:
- new telemetry storage table and store methods
- telemetry recording for `/v1/providers/test` and `/v1/runtime/invoke`
- new endpoint `GET /v1/providers/{provider_id}/telemetry`
- Added additive provider config field support:
- `model_providers.adapter_options` persisted and serialized
- Added migration:
- `20260411_0054_phase11_vllm_telemetry`
- Added/updated tests for runtime, integration, and migration coverage
- Added self-hosted docs and runnable script for vLLM end-to-end flow
- Updated control-doc truth check markers to `P11-S3`

## incomplete work
- None for `P11-S2` acceptance criteria and required verification commands.
- None identified within sprint scope.

## files changed
- `apps/api/src/alicebot_api/local_provider_helpers.py`
Sprint-owned files changed:
- `apps/api/src/alicebot_api/provider_runtime.py`
- `apps/api/src/alicebot_api/main.py`
- `apps/api/src/alicebot_api/store.py`
- `apps/api/src/alicebot_api/contracts.py`
- `apps/api/alembic/versions/20260411_0053_phase11_local_provider_config_fields.py`
- `apps/api/src/alicebot_api/store.py`
- `apps/api/src/alicebot_api/vllm_provider_helpers.py` (new)
- `apps/api/alembic/versions/20260411_0054_phase11_vllm_telemetry.py` (new)
- `tests/unit/test_provider_runtime.py`
- `tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py`
- `tests/integration/test_phase11_provider_runtime_api.py`
- `docs/integrations/phase11-local-provider-adapters.md`
- `scripts/run_phase11_local_provider_e2e.py`
- `tests/unit/test_20260411_0054_phase11_vllm_telemetry.py` (new)
- `docs/integrations/phase11-vllm-self-hosted.md` (new)
- `scripts/run_phase11_vllm_e2e.py` (new)
- `scripts/check_control_doc_truth.py`
- `README.md`
- `BUILD_REPORT.md`
- `REVIEW_REPORT.md`

Pre-existing dirty files excluded from sprint merge scope:
- `README.md`
- `ARCHITECTURE.md`
- `PRODUCT_BRIEF.md`

## tests run
Required verification commands and exact results:
- `python3 scripts/check_control_doc_truth.py`
- Result: `PASS`
- Verified: `README.md`, `ROADMAP.md`, `.ai/active/SPRINT_PACKET.md`, `RULES.md`, `.ai/handoff/CURRENT_STATE.md`, `docs/archive/planning/2026-04-08-context-compaction/README.md`
- `./.venv/bin/python -m pytest tests/unit tests/integration -q`
- Result: `PASS` (`1118 passed in 183.14s (0:03:03)`)
- Result: `1122 passed in 170.62s (0:02:50)`
- `pnpm --dir apps/web test`
- Result: `PASS` (`62 files`, `199 tests`, duration `4.82s`)
- Sprint-targeted subset:
- `./.venv/bin/python -m pytest tests/unit/test_provider_runtime.py tests/unit/test_20260411_0053_phase11_local_provider_config_fields.py tests/integration/test_phase11_provider_runtime_api.py -q`
- Result: `PASS` (`12 passed in 2.50s`)
- Result: `62 passed` test files, `199 passed` tests, duration `4.86s`

## blockers/issues
- No active implementation blockers.
- No blockers during implementation.

## recommended next step
1. Open a sprint PR from `codex/phase11-sprint-2-ollama-llamacpp-adapters` with this report and required test evidence.
2. Keep pre-existing dirty local docs (`ARCHITECTURE.md`, `PRODUCT_BRIEF.md`) excluded from sprint merge scope.
1. Open the sprint PR from branch `codex/phase11-sprint-3-vllm-adapter-selfhosted` and request review focused on vLLM telemetry schema and endpoint response shape stability.
63 changes: 32 additions & 31 deletions REVIEW_REPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,51 @@
PASS

## criteria met
- `P11-S2` local provider registration APIs are implemented and functioning:
- `POST /v1/providers/ollama/register`
- `POST /v1/providers/llamacpp/register`
- Existing in-scope APIs are functioning with local adapters:
- `P11-S3` acceptance criteria are met for the vLLM self-hosted path.
- vLLM registration is implemented through the shipped provider registry:
- `POST /v1/providers/vllm/register`
- Provider tests and capability snapshots expose deterministic self-hosted posture through the existing abstraction:
- `POST /v1/providers/test`
- capability snapshot fields include normalized telemetry posture (`supports_normalized_usage_telemetry`, `supports_normalized_latency_telemetry`, `telemetry_flow_scope`).
- Runtime invoke works through the shipped normalized provider contract for vLLM:
- `POST /v1/runtime/invoke`
- `GET /v1/providers`
- `GET /v1/providers/{provider_id}`
- Ollama and llama.cpp adapters are integrated through the shipped provider abstraction and registry.
- Capability snapshots include deterministic local model enumeration and health posture fields.
- Additive provider config fields are migrated and wired (`auth_mode`, `model_list_path`, `healthcheck_path`, `invoke_path`).
- Local setup documentation and runnable e2e example path are present.
- Regression fix validated: legacy `/v1/providers` path now correctly passes `store` into shared registration helper (`apps/api/src/alicebot_api/main.py:6174-6177`).
- Credential handling tightened: `auth_mode="none"` now rejects non-empty `api_key`, preventing plaintext persistence (`apps/api/src/alicebot_api/main.py:1562-1568`).
- New regression coverage added:
- OpenAI-compatible registration still works and stores secret ref, not plaintext (`tests/integration/test_phase11_provider_runtime_api.py:470-491`).
- `auth_mode="none"` rejects provided `api_key` (`tests/integration/test_phase11_provider_runtime_api.py:494-514`).
- Required verification commands pass on the current branch head:
- `python3 scripts/check_control_doc_truth.py` -> PASS
- `./.venv/bin/python -m pytest tests/unit tests/integration -q` -> PASS (`1118 passed in 183.14s`)
- `pnpm --dir apps/web test` -> PASS (`62 files`, `199 tests`, duration `4.82s`)
- Normalized latency and usage telemetry are persisted and exposed:
- migration adds `provider_invocation_telemetry`
- telemetry writes for `provider_test` and `runtime_invoke`
- `GET /v1/providers/{provider_id}/telemetry`
- Bounded provider-specific passthrough is implemented behind explicit vLLM adapter options (`adapter_options.invoke_passthrough` allowlist).
- Self-hosted docs and runnable examples are now internally consistent for local split endpoints (API `:8000`, vLLM provider `:8001`):
- [phase11-vllm-self-hosted.md](/Users/samirusani/Desktop/Codex/AliceBot/docs/integrations/phase11-vllm-self-hosted.md)
- [run_phase11_vllm_e2e.py](/Users/samirusani/Desktop/Codex/AliceBot/scripts/run_phase11_vllm_e2e.py)
- Existing `P11-S1` / `P11-S2` seams remain intact (verified by full unit+integration pass and existing integration coverage).

## criteria missed
- None identified for `P11-S2` acceptance criteria.
- None.

## quality issues
- No blocking quality issues remain in sprint-owned scope after fixes.
- No blocking quality issues found in sprint-owned changes after the endpoint-default fix.
- Out-of-scope dirty local docs remain present and should stay excluded from sprint merge scope:
- `ARCHITECTURE.md`
- `PRODUCT_BRIEF.md`
- `README.md` (pre-existing dirty context in branch)

## regression risks
- Low. Full required verification is passing, including new regression tests for the previously broken path.
- Residual operational risk remains external local-provider availability (Ollama/llama.cpp process reachability), which is surfaced via explicit discovery/test failure posture.
- Low.
- Required verification suite passes on current workspace state:
- `python3 scripts/check_control_doc_truth.py` -> PASS
- `./.venv/bin/python -m pytest tests/unit tests/integration -q` -> `1122 passed in 170.62s`
- `pnpm --dir apps/web test` -> `62 passed` files, `199 passed` tests, duration `4.86s`

## docs issues
- No local identifiers (local computer paths, names) were found in sprint-owned changed code/docs reviewed here.
- Out-of-scope dirty local docs remain and should stay excluded from sprint merge scope:
- `ARCHITECTURE.md`
- `PRODUCT_BRIEF.md`
- Fixed: vLLM self-hosted docs/script no longer default provider URL to the API URL.
- No local identifiers (local machine paths, personal names, local-only identifiers) were found in reviewed sprint-owned files.

## should anything be added to RULES.md?
- Optional improvement: require backward-compat regression tests for already-shipped endpoints whenever shared registration/runtime helpers are refactored.
- Optional: add a guardrail that runnable docs/scripts must use non-conflicting default endpoints in multi-service flows and be smoke-validated before merge.

## should anything update ARCHITECTURE.md?
- Optional improvement: add a concise note clarifying auth-mode credential invariants (`bearer` uses secret refs; `none` must not persist API keys).
- No required architecture update for `P11-S3` merge.

## recommended next action
1. Ready for Control Tower merge approval with the updated build and review evidence on this branch head.
2. Keep `ARCHITECTURE.md` and `PRODUCT_BRIEF.md` excluded from the sprint PR.
1. Proceed with sprint PR review/merge for `P11-S3`.
2. Keep non-sprint control-doc rewrites excluded from this PR unless explicitly approved as separate scope.
90 changes: 90 additions & 0 deletions apps/api/alembic/versions/20260411_0054_phase11_vllm_telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
"""Add vLLM adapter options and provider invocation telemetry."""

from __future__ import annotations

from alembic import op


revision = "20260411_0054"
down_revision = "20260411_0053"
branch_labels = None
depends_on = None

_UPGRADE_STATEMENTS = (
"ALTER TABLE model_providers ADD COLUMN adapter_options jsonb NOT NULL DEFAULT '{}'::jsonb",
(
"ALTER TABLE model_providers "
"ADD CONSTRAINT model_providers_adapter_options_object_check "
"CHECK (jsonb_typeof(adapter_options) = 'object')"
),
"""
CREATE TABLE provider_invocation_telemetry (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id uuid NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
provider_id uuid NOT NULL REFERENCES model_providers(id) ON DELETE CASCADE,
invoked_by_user_account_id uuid NOT NULL REFERENCES user_accounts(id) ON DELETE RESTRICT,
flow_kind text NOT NULL,
adapter_key text NOT NULL,
runtime_provider text NOT NULL,
provider_model text NOT NULL,
status text NOT NULL,
error_message text NULL,
latency_ms integer NOT NULL,
input_tokens integer NULL,
output_tokens integer NULL,
total_tokens integer NULL,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
CONSTRAINT provider_invocation_telemetry_flow_kind_check
CHECK (flow_kind IN ('provider_test', 'runtime_invoke')),
CONSTRAINT provider_invocation_telemetry_adapter_key_length_check
CHECK (char_length(adapter_key) >= 1 AND char_length(adapter_key) <= 80),
CONSTRAINT provider_invocation_telemetry_runtime_provider_length_check
CHECK (char_length(runtime_provider) >= 1 AND char_length(runtime_provider) <= 100),
CONSTRAINT provider_invocation_telemetry_provider_model_length_check
CHECK (char_length(provider_model) >= 1 AND char_length(provider_model) <= 200),
CONSTRAINT provider_invocation_telemetry_status_check
CHECK (status IN ('completed', 'failed')),
CONSTRAINT provider_invocation_telemetry_latency_non_negative_check
CHECK (latency_ms >= 0),
CONSTRAINT provider_invocation_telemetry_input_tokens_non_negative_check
CHECK (input_tokens IS NULL OR input_tokens >= 0),
CONSTRAINT provider_invocation_telemetry_output_tokens_non_negative_check
CHECK (output_tokens IS NULL OR output_tokens >= 0),
CONSTRAINT provider_invocation_telemetry_total_tokens_non_negative_check
CHECK (total_tokens IS NULL OR total_tokens >= 0)
)
""",
(
"CREATE INDEX provider_invocation_telemetry_provider_created_idx "
"ON provider_invocation_telemetry (provider_id, created_at DESC, id DESC)"
),
(
"CREATE INDEX provider_invocation_telemetry_workspace_created_idx "
"ON provider_invocation_telemetry (workspace_id, created_at DESC, id DESC)"
),
)

_UPGRADE_GRANT_STATEMENTS = (
"GRANT SELECT, INSERT, UPDATE, DELETE ON provider_invocation_telemetry TO alicebot_app",
)

_DOWNGRADE_STATEMENTS = (
"DROP TABLE IF EXISTS provider_invocation_telemetry",
"ALTER TABLE model_providers DROP CONSTRAINT IF EXISTS model_providers_adapter_options_object_check",
"ALTER TABLE model_providers DROP COLUMN IF EXISTS adapter_options",
)


def _execute_statements(statements: tuple[str, ...]) -> None:
for statement in statements:
op.execute(statement)


def upgrade() -> None:
_execute_statements(_UPGRADE_STATEMENTS)
_execute_statements(_UPGRADE_GRANT_STATEMENTS)


def downgrade() -> None:
_execute_statements(_DOWNGRADE_STATEMENTS)
38 changes: 37 additions & 1 deletion apps/api/src/alicebot_api/contracts.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,11 @@
ToolRoutingDecision = Literal["ready", "denied", "approval_required"]
PromptSectionName = Literal["system", "developer", "context", "conversation"]
ModelProvider = Literal["openai_responses"]
ProviderAdapterKey = Literal["openai_compatible", "ollama", "llamacpp"]
ProviderAdapterKey = Literal["openai_compatible", "ollama", "llamacpp", "vllm"]
ModelProviderStatus = Literal["active"]
ProviderCapabilityDiscoveryStatus = Literal["ready", "failed"]
ProviderInvocationFlowKind = Literal["provider_test", "runtime_invoke"]
ProviderInvocationStatus = Literal["completed", "failed"]
ModelFinishReason = Literal["completed", "incomplete"]
ExplicitPreferencePattern = Literal[
"i_like",
Expand Down Expand Up @@ -1553,6 +1555,7 @@ class ModelProviderRecord(TypedDict):
model_list_path: str
healthcheck_path: str
invoke_path: str
adapter_options: JsonObject
metadata: JsonObject
created_at: str
updated_at: str
Expand Down Expand Up @@ -1611,6 +1614,39 @@ class RuntimeInvokeResponse(TypedDict):
trace: ResponseTraceSummary


class ProviderInvocationTelemetryRecord(TypedDict):
id: str
workspace_id: str
provider_id: str
invoked_by_user_account_id: str
flow_kind: ProviderInvocationFlowKind
adapter_key: ProviderAdapterKey
runtime_provider: ModelProvider
provider_model: str
status: ProviderInvocationStatus
error_message: str | None
latency_ms: int
usage: ModelUsagePayload
metadata: JsonObject
created_at: str


class ProviderTelemetrySummary(TypedDict):
total_count: int
completed_count: int
failed_count: int
average_latency_ms: float
latest_created_at: str | None
usage_totals: ModelUsagePayload


class ProviderTelemetryResponse(TypedDict):
provider_id: str
summary: ProviderTelemetrySummary
items: list[ProviderInvocationTelemetryRecord]
order: list[str]


@dataclass(frozen=True, slots=True)
class OpenLoopCandidateInput:
title: str
Expand Down
Loading
Loading