providers: registry + glue.WithFailover (closes #90)#97
Conversation
Every Glue agent re-implemented the same per-tool boilerplate: a json.Unmarshal of call.Arguments into a private struct, plus local textResult / errorResult helpers. Promote the pattern into the framework so tool definitions drop ~15 lines each and gain consistent malformed- argument handling. - glue.TextResult(string) and glue.ErrorResult(error) replace the duplicated helpers (the agent had its own copies; future agents would too). - glue.NewTool[T any](spec, fn) decodes ToolCall.Arguments into T, treats empty arguments as the zero value, and returns an error ToolResult on JSON decode failure so the loop never crashes on a malformed call. Migrate agents/glue-review/tools.go as the first internal consumer: three tools converted, the local textResult/errorResult helpers removed. Schema generation from T is intentionally out of scope; callers continue to pass Parameters explicitly. Verification: go build ./... # ok go vet ./... # ok go test ./... # ok (all packages green) Closes #88 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Summary
This branch implements three focused changes: (1) a driver-style provider registry plus glue.WithFailover for automatic multi-provider fallback, (2) shared tool/fs/git packages and helpers (safe path joining, blocklists, truncation, git shell-out) extracted from inlined agent code per ADR 0003, and (3) typed glue.NewTool[T], TextResult, and ErrorResult helpers to reduce per-tool boilerplate and normalize malformed-argument handling.
Issues
No bugs or regressions detected in the diff.
Suggestions
- [minor] README.md:169 — The provider-failover example shows
glue.NewAgent(glue.AgentOptions{Provider: glue.WithFailover(provs...), Model: ""})withModel: ""and a comment that the provider’s default model will be used; it would be clearer to omit theModelfield entirely so readers don’t think an empty string is special. Fix: In README.md, remove theModel: "",line from theNewAgentexample so the zero value is not shown and the intent (“use each provider’s default”) is clearer. - [minor] agents/glue-review/tools.go:82 — The tool constructors still construct their own
ParameterJSON inline and duplicate some option handling (e.g., defaults) that could eventually call shared factory helpers; consider migrating totools/git.DiffBranchTool,LogBranchTool, andtools/fs.ReadFileToolin a follow-up to remove duplication (this is noted as a follow-up in the commit message, so just flagging for completeness). Fix: In agents/glue-review/tools.go, replace the hand-rolled tool factories with calls togit.DiffBranchTool,git.LogBranchTool, andfs.ReadFileTool(passing appropriate options) once the migration follow-up is planned/scheduled.
Looks good
- Failover logic is clean, testable, and preserves the “no half-streamed transcripts on retry” invariant with clear commit semantics.
- Provider registry supports case-insensitive lookup, sorted
Known(), and cheap registration — good for extensibility. - Tool/fs/git packages are nicely decoupled from core, with safe defaults and clear error surface to models.
NewTool[T]reduces per-tool boilerplate while keeping explicit JSON schema control as intended.- Tests cover new failover cases, registry behavior, and filesystem/git tool semantics including scratch-repo end-to-end tests.
Open questions
- Should
WithFailoverexpose an option/functional to customize the commit predicate (e.g., treat certain non-error events as “non-committing”) or is the current rule (“first non-error event commits forever”) the intended long-term contract? If the latter, document it prominently in package-level docs. - Do the provider registry
Factoryvalues need aConfigure(...Option)hook for future extensibility (e.g., per-provider timeouts, retry policies), or will options always be passed atNew/Streamtime viaOptionson the concrete provider? If the former, consider reserving a field now.
🤖 Posted by glue-review.
| ) | ||
| ``` | ||
|
|
||
| ### Provider failover |
There was a problem hiding this comment.
[minor] The provider-failover example shows glue.NewAgent(glue.AgentOptions{Provider: glue.WithFailover(provs...), Model: ""}) with Model: "" and a comment that the provider’s default model will be used; it would be clearer to omit the Model field entirely so readers don’t think an empty string is special
💡 AI prompt to fix
In README.md, remove the `Model: "",` line from the `NewAgent` example so the zero value is not shown and the intent (“use each provider’s default”) is clearer.
| func gitDiffBranchTool(workDir string, pathspec []string) glue.Tool { | ||
| return glue.Tool{ | ||
| ToolSpec: glue.ToolSpec{ | ||
| return glue.NewTool[gitDiffArgs]( |
There was a problem hiding this comment.
[minor] The tool constructors still construct their own Parameter JSON inline and duplicate some option handling (e.g., defaults) that could eventually call shared factory helpers; consider migrating to tools/git.DiffBranchTool, LogBranchTool, and tools/fs.ReadFileTool in a follow-up to remove duplication (this is noted as a follow-up in the commit message, so just flagging for completeness)
💡 AI prompt to fix
In agents/glue-review/tools.go, replace the hand-rolled tool factories with calls to `git.DiffBranchTool`, `git.LogBranchTool`, and `fs.ReadFileTool` (passing appropriate options) once the migration follow-up is planned/scheduled.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aee11ace9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| func init() { | ||
| providers.Register(providerName, providers.Factory{ | ||
| New: func() loop.Provider { return New(Options{}) }, |
There was a problem hiding this comment.
Register factory providers with default models
The registry factories are constructing providers with New(Options{}), which leaves DefaultModel empty even though each package now publishes a DefaultModel constant. With the new failover workflow (model left unset so each provider uses its own default), these registry-created providers fail immediately in Stream with a “model is required” error, so failover cannot actually use them. Initialize the factory with Options{DefaultModel: DefaultModel} (same issue appears in gemini/openrouter/nvidia registrations).
Useful? React with 👍 / 👎.
| candidate := filepath.Clean(filepath.Join(absBase, rel)) | ||
| rel2, err := filepath.Rel(absBase, candidate) |
There was a problem hiding this comment.
Resolve symlinks before accepting SafeJoin path
SafeJoin only validates the lexical path (Join/Clean/Rel) and then returns it, so a path like link/secret is accepted when link is a symlink inside WorkDir pointing outside the repo. ReadFileTool then opens that resolved path and follows the symlink, allowing reads outside the intended root despite the traversal guard. This needs symlink resolution and post-resolution containment checks before returning success.
Useful? React with 👍 / 👎.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote the path-safety / git-shell-out helpers that agents/glue-review inlined into shared, audited extension packages. ADR 0003 was approved months ago but no implementation landed; the agent's tools.go and blocklist.go are empirical proof of the API shape, so port them. tools/fs: - SafeJoin(base, rel) — traversal + symlink defense - Truncate(s, max) — output cap with marker - Blocklist (Default/Merge/Match) — three-way pattern match (whole-path, basename, components, case-insensitive) - ReadFileTool(opts) — ready-to-register read tool composed of the above tools/git: - RunGit(ctx, opts, args...) — PATH lookup, configurable timeout (default 15s), stderr captured into the error - BuildPathspec(includes, excludes) — Git pathspec construction with the catch-all-include rule - DiffBranchTool(opts) — git_diff_branch tool - LogBranchTool(opts) — git_log_branch tool Both packages live outside the core glue package per ADR 0003 so the harness stays free of POSIX coupling. They depend on glue (for Tool / ToolSpec / ToolResult and the new NewTool[T] helper from #88); the core does not depend on them. ADR 0003 marked Implemented; design.md package boundaries updated. Migrating agents/glue-review to use these packages is filed as a follow-up so this PR stays single-purpose and reviewable. Verification: go build ./... # ok go vet ./... # ok go test ./... # ok go test ./tools/fs ./tools/git # ok (table tests + scratch-repo end-to-end) Closes #89 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop forcing agents to hand-code provider failover. Today an agent that
wants nvidia → openrouter → gemini failover maintains a private map of
provider → env var, hand-codes per-attempt session ids to avoid
transcript poisoning, and buffers streamed text per attempt. Move the
boring parts into the framework.
New driver-style registry at providers/registry.go:
- providers.Register(name, Factory) — providers self-register in init()
- providers.New(name) → (Provider, defaultModel, envKey, error)
- providers.Lookup(name), providers.Known() (sorted), providers.KeyAvailable(name)
- Case-insensitive lookup; unknown name returns an error listing the
registered names
Each shipped provider exposes EnvKey + DefaultModel constants and
self-registers from init() (gemini, nvidia, openrouter). Importing
`_ "github.com/erain/glue/providers/<name>"` makes the name resolvable.
WithFailover wrapper in package glue:
- glue.WithFailover(provs ...Provider) Provider
- Tries each provider in order; falls through on Stream error,
immediate ProviderEventError, or empty stream
- Commits to a provider once any non-error event is observed —
preserves the loop's "no half-streamed transcripts on retry"
invariant, no mid-stream recovery
- All-providers-failed surfaces as *FailoverError with per-provider
attempts (errors.As compatible)
README: new "Provider failover" subsection. design.md: provider
registry listed under package boundaries.
Migrating agents/glue-review off its hand-coded failover is filed as a
follow-up so this PR stays single-purpose.
Verification:
go build ./... # ok
go vet ./... # ok
go test . ./loop ./providers/... ./tools/... # ok (all green)
go test ./agents/glue-review # one live test
# rate-limited
# by NVIDIA (HTTP 429),
# not a regression
Closes #90
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aee11ac to
aa47515
Compare
Summary
providers/:Register,New,Lookup,Known,KeyAvailable. Case-insensitive lookup; unknown name returns an error listing the registered names.EnvKey+DefaultModelconstants and self-registers frominit()(gemini, nvidia, openrouter). Importing_ "github.com/erain/glue/providers/<name>"makes the name resolvable.glue.WithFailover(provs ...Provider) Provider— tries each in order; falls through on Stream error, immediateProviderEventError, or empty stream. Commits to a provider once any non-error event is observed. All-providers-failed surfaces as*glue.FailoverError.docs/design.md: registry listed under package boundaries.Stacked on #95, #96 (issues #88, #89). Migrating
agents/glue-reviewoff its hand-coded failover is filed as a follow-up so this PR stays single-purpose.Closes #90
Test plan
go build ./...go vet ./...go test . ./loop ./providers/... ./tools/...— all greengo test ./agents/glue-review— TestLiveReviewSmoke / TestFixtureReplay rate-limited by NVIDIA (HTTP 429) during local verification; pre-existing flake unrelated to this change🤖 Generated with Claude Code