Skip to content

providers: registry + glue.WithFailover (closes #90)#97

Merged
erain merged 5 commits into
mainfrom
issue/90-provider-registry
May 7, 2026
Merged

providers: registry + glue.WithFailover (closes #90)#97
erain merged 5 commits into
mainfrom
issue/90-provider-registry

Conversation

@erain
Copy link
Copy Markdown
Owner

@erain erain commented May 7, 2026

Summary

  • Driver-style registry at providers/: Register, New, Lookup, Known, KeyAvailable. Case-insensitive lookup; unknown name returns an error listing the registered names.
  • Each shipped provider exposes EnvKey + DefaultModel constants and self-registers from init() (gemini, nvidia, openrouter). Importing _ "github.com/erain/glue/providers/<name>" makes the name resolvable.
  • glue.WithFailover(provs ...Provider) Provider — tries each in order; falls through on Stream error, immediate ProviderEventError, or empty stream. Commits to a provider once any non-error event is observed. All-providers-failed surfaces as *glue.FailoverError.
  • README: new "Provider failover" subsection. docs/design.md: registry listed under package boundaries.

Stacked on #95, #96 (issues #88, #89). Migrating agents/glue-review off its hand-coded failover is filed as a follow-up so this PR stays single-purpose.

Closes #90

Test plan

  • go build ./...
  • go vet ./...
  • go test . ./loop ./providers/... ./tools/... — all green
  • Failover unit tests cover: first-succeeds, Stream-error fall-through, first-event-error fall-through, empty-stream fall-through, all-fail aggregate error, no-providers, commits-after-first-successful-event invariant
  • Registry unit tests cover: Register/New round-trip, unknown-name error, KeyAvailable env-probing, sorted Known(), case-insensitive lookup
  • go test ./agents/glue-review — TestLiveReviewSmoke / TestFixtureReplay rate-limited by NVIDIA (HTTP 429) during local verification; pre-existing flake unrelated to this change

🤖 Generated with Claude Code

Every Glue agent re-implemented the same per-tool boilerplate: a
json.Unmarshal of call.Arguments into a private struct, plus local
textResult / errorResult helpers. Promote the pattern into the framework
so tool definitions drop ~15 lines each and gain consistent malformed-
argument handling.

- glue.TextResult(string) and glue.ErrorResult(error) replace the
  duplicated helpers (the agent had its own copies; future agents would
  too).
- glue.NewTool[T any](spec, fn) decodes ToolCall.Arguments into T,
  treats empty arguments as the zero value, and returns an error
  ToolResult on JSON decode failure so the loop never crashes on a
  malformed call.

Migrate agents/glue-review/tools.go as the first internal consumer:
three tools converted, the local textResult/errorResult helpers removed.

Schema generation from T is intentionally out of scope; callers continue
to pass Parameters explicitly.

Verification:
  go build ./...      # ok
  go vet ./...        # ok
  go test ./...       # ok (all packages green)

Closes #88

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This branch implements three focused changes: (1) a driver-style provider registry plus glue.WithFailover for automatic multi-provider fallback, (2) shared tool/fs/git packages and helpers (safe path joining, blocklists, truncation, git shell-out) extracted from inlined agent code per ADR 0003, and (3) typed glue.NewTool[T], TextResult, and ErrorResult helpers to reduce per-tool boilerplate and normalize malformed-argument handling.

Issues

No bugs or regressions detected in the diff.

Suggestions

  • [minor] README.md:169 — The provider-failover example shows glue.NewAgent(glue.AgentOptions{Provider: glue.WithFailover(provs...), Model: ""}) with Model: "" and a comment that the provider’s default model will be used; it would be clearer to omit the Model field entirely so readers don’t think an empty string is special. Fix: In README.md, remove the Model: "", line from the NewAgent example so the zero value is not shown and the intent (“use each provider’s default”) is clearer.
  • [minor] agents/glue-review/tools.go:82 — The tool constructors still construct their own Parameter JSON inline and duplicate some option handling (e.g., defaults) that could eventually call shared factory helpers; consider migrating to tools/git.DiffBranchTool, LogBranchTool, and tools/fs.ReadFileTool in a follow-up to remove duplication (this is noted as a follow-up in the commit message, so just flagging for completeness). Fix: In agents/glue-review/tools.go, replace the hand-rolled tool factories with calls to git.DiffBranchTool, git.LogBranchTool, and fs.ReadFileTool (passing appropriate options) once the migration follow-up is planned/scheduled.

Looks good

  • Failover logic is clean, testable, and preserves the “no half-streamed transcripts on retry” invariant with clear commit semantics.
  • Provider registry supports case-insensitive lookup, sorted Known(), and cheap registration — good for extensibility.
  • Tool/fs/git packages are nicely decoupled from core, with safe defaults and clear error surface to models.
  • NewTool[T] reduces per-tool boilerplate while keeping explicit JSON schema control as intended.
  • Tests cover new failover cases, registry behavior, and filesystem/git tool semantics including scratch-repo end-to-end tests.

Open questions

  • Should WithFailover expose an option/functional to customize the commit predicate (e.g., treat certain non-error events as “non-committing”) or is the current rule (“first non-error event commits forever”) the intended long-term contract? If the latter, document it prominently in package-level docs.
  • Do the provider registry Factory values need a Configure(...Option) hook for future extensibility (e.g., per-provider timeouts, retry policies), or will options always be passed at New/Stream time via Options on the concrete provider? If the former, consider reserving a field now.

🤖 Posted by glue-review.

Comment thread README.md
)
```

### Provider failover
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The provider-failover example shows glue.NewAgent(glue.AgentOptions{Provider: glue.WithFailover(provs...), Model: ""}) with Model: "" and a comment that the provider’s default model will be used; it would be clearer to omit the Model field entirely so readers don’t think an empty string is special

💡 AI prompt to fix
In README.md, remove the `Model: "",` line from the `NewAgent` example so the zero value is not shown and the intent (“use each provider’s default”) is clearer.

func gitDiffBranchTool(workDir string, pathspec []string) glue.Tool {
return glue.Tool{
ToolSpec: glue.ToolSpec{
return glue.NewTool[gitDiffArgs](
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] The tool constructors still construct their own Parameter JSON inline and duplicate some option handling (e.g., defaults) that could eventually call shared factory helpers; consider migrating to tools/git.DiffBranchTool, LogBranchTool, and tools/fs.ReadFileTool in a follow-up to remove duplication (this is noted as a follow-up in the commit message, so just flagging for completeness)

💡 AI prompt to fix
In agents/glue-review/tools.go, replace the hand-rolled tool factories with calls to `git.DiffBranchTool`, `git.LogBranchTool`, and `fs.ReadFileTool` (passing appropriate options) once the migration follow-up is planned/scheduled.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aee11ace9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


func init() {
providers.Register(providerName, providers.Factory{
New: func() loop.Provider { return New(Options{}) },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register factory providers with default models

The registry factories are constructing providers with New(Options{}), which leaves DefaultModel empty even though each package now publishes a DefaultModel constant. With the new failover workflow (model left unset so each provider uses its own default), these registry-created providers fail immediately in Stream with a “model is required” error, so failover cannot actually use them. Initialize the factory with Options{DefaultModel: DefaultModel} (same issue appears in gemini/openrouter/nvidia registrations).

Useful? React with 👍 / 👎.

Comment thread tools/fs/fs.go
Comment on lines +37 to +38
candidate := filepath.Clean(filepath.Join(absBase, rel))
rel2, err := filepath.Rel(absBase, candidate)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve symlinks before accepting SafeJoin path

SafeJoin only validates the lexical path (Join/Clean/Rel) and then returns it, so a path like link/secret is accepted when link is a symlink inside WorkDir pointing outside the repo. ReadFileTool then opens that resolved path and follows the symlink, allowing reads outside the intended root despite the traversal guard. This needs symlink resolution and post-resolution containment checks before returning success.

Useful? React with 👍 / 👎.

erain and others added 4 commits May 7, 2026 10:13
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote the path-safety / git-shell-out helpers that agents/glue-review
inlined into shared, audited extension packages. ADR 0003 was approved
months ago but no implementation landed; the agent's tools.go and
blocklist.go are empirical proof of the API shape, so port them.

tools/fs:
- SafeJoin(base, rel) — traversal + symlink defense
- Truncate(s, max) — output cap with marker
- Blocklist (Default/Merge/Match) — three-way pattern match
  (whole-path, basename, components, case-insensitive)
- ReadFileTool(opts) — ready-to-register read tool composed of the above

tools/git:
- RunGit(ctx, opts, args...) — PATH lookup, configurable timeout
  (default 15s), stderr captured into the error
- BuildPathspec(includes, excludes) — Git pathspec construction with
  the catch-all-include rule
- DiffBranchTool(opts) — git_diff_branch tool
- LogBranchTool(opts) — git_log_branch tool

Both packages live outside the core glue package per ADR 0003 so the
harness stays free of POSIX coupling. They depend on glue (for Tool /
ToolSpec / ToolResult and the new NewTool[T] helper from #88); the core
does not depend on them. ADR 0003 marked Implemented; design.md package
boundaries updated.

Migrating agents/glue-review to use these packages is filed as a
follow-up so this PR stays single-purpose and reviewable.

Verification:
  go build ./...                    # ok
  go vet ./...                      # ok
  go test ./...                     # ok
  go test ./tools/fs ./tools/git    # ok (table tests + scratch-repo end-to-end)

Closes #89

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop forcing agents to hand-code provider failover. Today an agent that
wants nvidia → openrouter → gemini failover maintains a private map of
provider → env var, hand-codes per-attempt session ids to avoid
transcript poisoning, and buffers streamed text per attempt. Move the
boring parts into the framework.

New driver-style registry at providers/registry.go:
- providers.Register(name, Factory) — providers self-register in init()
- providers.New(name) → (Provider, defaultModel, envKey, error)
- providers.Lookup(name), providers.Known() (sorted), providers.KeyAvailable(name)
- Case-insensitive lookup; unknown name returns an error listing the
  registered names

Each shipped provider exposes EnvKey + DefaultModel constants and
self-registers from init() (gemini, nvidia, openrouter). Importing
`_ "github.com/erain/glue/providers/<name>"` makes the name resolvable.

WithFailover wrapper in package glue:
- glue.WithFailover(provs ...Provider) Provider
- Tries each provider in order; falls through on Stream error,
  immediate ProviderEventError, or empty stream
- Commits to a provider once any non-error event is observed —
  preserves the loop's "no half-streamed transcripts on retry"
  invariant, no mid-stream recovery
- All-providers-failed surfaces as *FailoverError with per-provider
  attempts (errors.As compatible)

README: new "Provider failover" subsection. design.md: provider
registry listed under package boundaries.

Migrating agents/glue-review off its hand-coded failover is filed as a
follow-up so this PR stays single-purpose.

Verification:
  go build ./...                                       # ok
  go vet ./...                                         # ok
  go test . ./loop ./providers/... ./tools/...         # ok (all green)
  go test ./agents/glue-review                          # one live test
                                                       # rate-limited
                                                       # by NVIDIA (HTTP 429),
                                                       # not a regression

Closes #90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erain erain force-pushed the issue/90-provider-registry branch from aee11ac to aa47515 Compare May 7, 2026 14:15
@erain erain merged commit f1046e5 into main May 7, 2026
3 of 4 checks passed
@erain erain deleted the issue/90-provider-registry branch May 7, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider registry + WithFailover wrapper

1 participant