Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ FROM ${DOCKER_HUB}/alpine:3.23
ENV UID=1337 \
GID=1337

RUN apk add --no-cache su-exec ca-certificates bash jq curl yq-go
RUN apk add --no-cache su-exec ca-certificates bash jq curl yq-go tzdata

COPY --from=builder /build/ai /usr/bin/ai
COPY ./docker-run.sh /docker-run.sh
Expand Down
41 changes: 19 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ Content is the universal `ContentBlock` (`text` / `thinking` / `toolCall` / `ima

**Same protocol, new vendor** — usually *no code*:

1. Add a model entry (in the generated catalog or a hand-built `ai.Model`) with the right `API`, `Provider`, `BaseURL`.
1. Add a model entry in ai-services, or configure a custom provider model with the right `API`, `Provider`, `BaseURL`.
2. Map the provider to its API-key env var(s) in `pkg/ai/env_api_keys.go`.
3. Add any `Compat` overrides (most base-URL patterns are auto-detected by `detectOpenAICompletionsCompat`).

Expand All @@ -317,19 +317,15 @@ Provider-specific behaviors worth knowing live in `pkg/ai/providers`: OpenAI *Co

## The model catalog

The runtime catalog is `Models` (`map[Provider]map[string]Model`) loaded from a large JSON literal in `pkg/ai/models_generated.go`. Accessors: `GetModel`, `GetProviders`, `GetModels`.
The model catalog is owned by ai-services. The bridge loads `/models?feature=bridge:ai`, applies each model's runtime metadata, and fails provider resolution if ai-services does not return a catalog. There is no bridge-generated fallback catalog.

It is **generated** by `cmd/generate-models-go`:
Custom providers use the same ai-services catalog for model metadata. The bridge filters that catalog to the supported provider runtime (`openai`, `openrouter`, `anthropic`, or `google-vertex`) and then uses the user's configured base URL and API key for execution. Arbitrary model IDs and generic OpenAI-compatible providers are not accepted.

```sh
go run ./cmd/generate-models-go [output-path] [--include-unregistered]
```

It fetches `models.dev` and `openrouter.ai`, keeps only **tool-capable** models, normalizes capabilities/pricing, and applies hand-maintained overrides (`pkg/ai/modelcatalog/`) — e.g. `ThinkingLevelMap` for gpt-5/Gemini-3, Anthropic-style cache-control for OpenRouter Anthropic models. Reasoning levels form a ladder `off < minimal < low < medium < high < xhigh`; `ClampThinkingLevel` snaps a request to the nearest supported level.
Reasoning levels form a ladder `off < minimal < low < medium < high < xhigh`; `ClampThinkingLevel` snaps a request to the nearest supported level from the model metadata the bridge was given.

## Image generation

Image generation is a **separate path** (`pkg/ai/images.go`, `images_*.go`): `ai.GenerateImages(ctx, ImagesModel, ImagesContext, ImagesOptions) AssistantImages` (synchronous, no streaming). It has its own model catalog (`image_models_generated.go` — FLUX.2, Seedream, Gemini "Nano Banana", GPT Image, Recraft, etc.) and its own registry. The built-in implementation routes through OpenRouter; blank-import `pkg/ai/providers/images` to enable it. Models can also expose **provider-native** `image_generation` as a built-in tool (see [chat tools](#built-in-chat-tools--adding-your-own)).
Image generation is a **separate path** (`pkg/ai/images.go`, `images_*.go`): `ai.GenerateImages(ctx, ImagesModel, ImagesContext, ImagesOptions) AssistantImages` (synchronous, no streaming). Image model metadata must come from ai-services or explicit provider configuration; the bridge does not keep a generated image catalog. The built-in implementation supports OpenRouter; blank-import `pkg/ai/providers/images` to enable it. Models can also expose **provider-native** `image_generation` as a built-in tool (see [chat tools](#built-in-chat-tools--adding-your-own)).

## The agent runtime

Expand Down Expand Up @@ -374,21 +370,23 @@ Errors are typed with codes (`pkg/agent/harness/public_errors.go`): `CompactionE

| Tool | Purpose | Notes |
|------|---------|-------|
| `get_session` | Live chat metadata (current time/timezone, model, reasoning, disabled tools, attachments) | read-only; recomputes time per call |
| `fetch` | Fetch an HTTP/HTTPS URL → readable text + metadata | direct fetch (≤2 MiB, ≤20 000 chars) or Exa-backed contents (≤10 000 chars) with fallback |
| `web_search` | Web search via Exa | only enabled for the Beeper provider with a proxy token; rich Exa options; results become source citations |
| `get_session` | Live chat metadata (current time/timezone, model, reasoning, search/fetch modes, attachments) | read-only; recomputes time per call |
| `fetch` | Fetch a full HTTP/HTTPS URL → readable text + metadata | Beeper mode: direct fetch (≤2 MiB, ≤20 000 chars) or AI-services `/tools/fetch` extraction with fallback; native mode: provider URL-context/fetch tool when available |
| `web_search` | Web search | Exa-backed Beeper search, enabled when room search mode is `beeper`; returns concise URL results for optional follow-up `fetch` calls |

Tools are gated per-room via the `com.beeper.ai.tools` state event. `search` may be `off`, `beeper`, or `native`; `fetch` may be `off`, `beeper`, or `native`. The legacy `disabled` array is still read for older room state. In `beeper` mode, web tools route through AI-services (`/tools/web_search`, `/tools/fetch`) using the appservice bearer token. In `native` mode, provider-native tools are injected where supported: OpenAI/OpenRouter search, OpenRouter fetch, Anthropic search/fetch, and Google search/URL context. If the selected provider API has no native equivalent, that native tool is unavailable. Search result URLs stay in the tool view; fetched pages, provider-native citation annotations, URL-context metadata, and final-answer URLs become canonical `com.beeper.source` artifacts for client source cards. Other provider-native built-ins, such as `image_generation`, are still injected from the model catalog (`pkg/connector/builtin_tools.go`).

Tools are gated per-room via the `com.beeper.ai.tools` state event's `disabled` array. Exa-backed tools route through the AI-services proxy (`/proxy/exa/v1/...`) using the appservice bearer token. Some models additionally expose **provider-native** built-ins (`image_generation`, `web_search`) injected into the request payload (`pkg/connector/builtin_tools.go`).
`fetch` tries the URL directly first with `Accept` preferring Markdown, plain text, JSON, XML, and CSV. If the response is already agent-readable (Markdown/plain/JSON/XML/CSV/source-ish), it returns that result without backend extraction. If the response is HTML, it checks HTTP `Link` headers and HTML `<link rel="alternate" type="text/markdown|text/plain">` for a readable alternate and fetches that directly. Only when the direct representation is not agent-ready does it call AI-services `/tools/fetch`. Local/private hosts, GitHub raw/gist URLs, GitLab-style raw paths, and source/text file extensions are treated as direct-fetch candidates.

**Adding a tool:**

1. Write a constructor returning `agent.AgentTool[any]`; build the schema with `objectSchema(props, required)` and pull args with the `helpers.go` coercers (args arrive as `map[string]any`; JSON numbers are `float64`).
2. Return `jsonResult(value)` for consistent text + `Details` output.
3. Register in `chattools.Tools` (unconditionally or behind a config gate).
4. Wire config in `pkg/connector/chat_tools.go` and honor `DisabledTools`.
5. If it produces citable sources, mirror `webSearchSourceParts` so URLs surface as message sources.
5. If it produces citable sources, add canonical source observations in `pkg/connector/sources.go` so URLs surface as message sources.

> **Security note:** `fetch` has **no SSRF guard** — it can reach localhost/private/link-local addresses (it just bypasses Exa for them). Treat it accordingly in your threat model.
> **Security note:** the direct fetch path intentionally bypasses AI-services for localhost/private/link-local addresses, raw asset URLs, and source-like files, and has **no SSRF guard**. Deployments that cannot allow bridge-origin egress to private networks should enforce an upstream deny-list or network policy before enabling `fetch`.

## Sessions: the branching conversation tree

Expand All @@ -414,8 +412,8 @@ The bridge advertises five login flows (`pkg/connector/login.go`):

| Flow | What it does |
|------|--------------|
| `beeper` | The default **Beeper AI** login. Routes through an `ai-services.<domain>` proxy derived from the user's homeserver; uses an appservice bearer token, no stored key. Read-only/managed. |
| `openai-responses` / `openai-completions` / `openai-codex-responses` | **Custom provider**: enter base URL + API key, the bridge fetches `/models`, you pick a default model. |
| `beeper` | The default **Beeper AI** login. Loads its catalog and runtime proxy metadata from `ai-services.<domain>` derived from the user's homeserver; uses an appservice bearer token, no stored key. Read-only/managed. |
| `openai-responses` / `openai-completions` / `openai-codex-responses` / `anthropic-messages` / `google-vertex` | **Custom provider**: enter base URL + API key, the bridge loads matching model metadata from ai-services, then you pick a default model. |
| `chatgpt-device` | **ChatGPT** OAuth device-code flow (PKCE). Stores access + refresh tokens, auto-refreshes within 2 min of expiry. |

One Matrix user can hold multiple AI logins; there's a canonical "AI Chats" login per user. Provider configs (with secrets) live in `UserLoginMetadata.Providers`. API keys support `env:NAME` indirection. The `beeper` provider is special and **read-only** — it can't be added/updated/deleted.
Expand Down Expand Up @@ -464,7 +462,7 @@ compaction:

Three scopes layer together: **bridge-wide** YAML → **per-login** provider configs (`UserLoginMetadata.Providers`) → **per-room** state (`com.beeper.ai.model` / `.additional_prompt` / `.tools`).

Relevant constants: default Beeper model `beeper/default`, title-generation model `gpt-4.1-mini` (fallback `gpt-5-mini`), default AI-services proxy path `/proxy/openai/v1`.
Relevant constants: default Beeper model `beeper/default`, title-generation model `gpt-4.1-mini` (fallback `gpt-5-mini`), default AI Services base URL derived from the user's homeserver domain.

---

Expand Down Expand Up @@ -520,7 +518,7 @@ It serves `/v1/models`, `/v1/responses`, `/v1/chat/completions`, and `/api/strea
- **Reasoning is double-validated and clamped** — setting a model can silently change the effective reasoning level.
- **Two parallel session-tree implementations** (`aidb` vs `session` SQLite files) with near-duplicate SQL and one subtle difference (`ON DELETE CASCADE`).
- **Token counts are estimates** (≈ chars/4) — compaction thresholds are approximate.
- **`fetch` has no SSRF protection.**
- **The direct fetch path intentionally allows private-network/raw-asset egress and has no SSRF protection.**
- **`ProviderConfig` holds secrets** (API keys, refresh tokens) in login metadata and serializes to JSON *and* YAML — don't log it.
- **AG-UI `Event` is a map, not a struct** — read typed fields via `Get`/`String`; unknown fields survive round-trips.

Expand All @@ -531,8 +529,7 @@ It serves `/v1/models`, `/v1/responses`, `/v1/chat/completions`, and `/api/strea
| Package | Responsibility |
|---------|----------------|
| `cmd/ai` | bridge entry point (registers connector + providers) |
| `cmd/generate-models-go` | regenerates the text-model catalog from upstream sources |
| `pkg/ai` | provider/API/model abstraction, streaming interface, model catalog, env keys |
| `pkg/ai` | provider/API/model abstraction, streaming interface, env keys |
| `pkg/ai/providers` | built-in provider implementations (OpenAI Completions/Responses/Codex, Anthropic, Google GenAI/Vertex) + image generation |
| `pkg/ai-stream` | the `Run` model: AG-UI event accumulation, anchor/stream/final projection, approvals, final-payload sizing |
| `pkg/ag-ui` | the AG-UI wire event protocol, typed events, schema, validation, capabilities |
Expand All @@ -541,7 +538,7 @@ It serves `/v1/models`, `/v1/responses`, `/v1/chat/completions`, and `/api/strea
| `pkg/agent/harness/session` | branching conversation tree (per-conversation SQLite) |
| `pkg/agent/autocompact` | compaction trigger policy |
| `pkg/chattools` | built-in tools: `get_session`, `fetch`, `web_search` |
| `pkg/connector` | the `bridgev2` connector: rooms↔sessions, slash/bridge commands, login, provider routes, capabilities, contacts, direct media, room state |
| `pkg/connector` | the `bridgev2` connector: rooms↔sessions, slash/bridge commands, login, provider catalog loading, capabilities, contacts, direct media, room state |
| `pkg/msgconv` | Matrix ⇄ AI message conversion |
| `pkg/aiid` | deterministic IDs + metadata types |
| `pkg/aidb` | bridge-DB persistence: session storage + active-stream resume |
Expand Down
73 changes: 0 additions & 73 deletions cmd/generate-models-go/main.go

This file was deleted.

21 changes: 0 additions & 21 deletions cmd/generate-models-go/main_test.go

This file was deleted.

9 changes: 8 additions & 1 deletion pkg/agent/agent_loop.go
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,14 @@ func hasSequentialToolCall(currentContext *AgentContext, toolCalls []AgentToolCa
}

func createErrorToolResult(message string) AgentToolResult[any] {
return AgentToolResult[any]{Content: []ai.ContentBlock{{Type: "text", Text: message}}, Details: map[string]any{}}
return AgentToolResult[any]{
Content: []ai.ContentBlock{{Type: "text", Text: message}},
Details: map[string]any{
"state": "error",
"status": "failed",
"reason": message,
},
}
}

func shouldTerminateToolBatch(results []AgentToolResult[any]) bool {
Expand Down
4 changes: 4 additions & 0 deletions pkg/agent/agent_loop_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,10 @@ func TestRunAgentLoopTurnsToolUpdateEmitErrorIntoToolResult(t *testing.T) {
if content[0].Text != "update emit failed" {
t.Fatalf("expected update emit error content, got %#v", content)
}
details, _ := messages[2].Details.(map[string]any)
if details["state"] != "error" || details["status"] != "failed" || details["reason"] != "update emit failed" {
t.Fatalf("expected structured tool error details, got %#v", messages[2].Details)
}
}

func TestRunAgentLoopValidatesPreparedToolArguments(t *testing.T) {
Expand Down
Loading
Loading