Skip to content

feat: add agent catalog/auth API and safer orchestrator switching#276

Open
nikhilachale wants to merge 14 commits into
aoagents:mainfrom
nikhilachale:agent-switcher
Open

feat: add agent catalog/auth API and safer orchestrator switching#276
nikhilachale wants to merge 14 commits into
aoagents:mainfrom
nikhilachale:agent-switcher

Conversation

@nikhilachale

@nikhilachale nikhilachale commented Jun 17, 2026

Copy link
Copy Markdown

Summary

This PR adds a daemon-backed agent catalog, exposes installed/authorized agent state to the frontend, and uses that data in project settings so users can choose worker/orchestrator agents more safely.

It also adds orchestrator replacement handling: when the saved orchestrator agent changes, AO starts the replacement first and only retires the previous orchestrator after the new one is up, so a failed replacement does not cause downtime.

A key caveat is that agent-auth/login flows can interfere with replacement startup. If switching agents triggers the agent’s own bootstrap path, the replacement may come up outside AO’s normal orchestrator initialization path and miss the AO orchestrator system prompt.

What Changed

Backend

  • Added agent inventory service for:

    • supported agents
    • installed agents
    • authorized agents
    • counts for each
  • Added optional AgentAuthChecker capability on adapters.

  • Added shared CLI auth probing helper for adapters with cheap local auth checks.

  • Added GET /api/v1/agents.

  • Extended registry inventory entries to carry adapter manifest metadata for user-facing labels.

  • Added orchestrator replacement flow in the session service:

    • spawn replacement first
    • retire previous orchestrator only after successful replacement
    • preserve previous orchestrator when replacement startup fails
  • Added backend tests for agent catalog, controller responses, session replacement behavior, and related project/service wiring.

Frontend

  • Regenerated API types for the new agents endpoint/DTOs.

  • Updated ProjectSettingsForm to:

    • load agent catalog from the daemon
    • show authorized agent options
    • handle installed-but-not-authorized states
    • surface orchestrator replacement pending state
    • allow retry once replacement is safe to perform
  • Added/updated tests for the new settings behavior.

Why

Before this change, the UI did not have a daemon-backed view of which agents are actually installed and authenticated on the local machine, and changing orchestrator agent config did not have a clear replacement flow.

This PR makes agent selection more grounded in local runtime state and reduces the chance of downtime during orchestrator switches.

Risks / Caveats

  • A large part of the file count comes from:

    • generated API artifacts
    • frontend tests
    • small per-adapter auth probe shims
  • The CLI/runtime/session model is unchanged outside the new inventory/auth and orchestrator replacement paths.

  • Generated files are included intentionally:

    • backend/internal/httpd/apispec/openapi.yaml
    • frontend/src/api/schema.ts
  • Auth/login flows remain a review risk. If switching agents triggers the agent’s own login/bootstrap flow, that flow can spawn a fresh native session outside AO’s normal orchestrator startup path.

  • In that case, the replacement session may miss AO’s expected initialization, including the orchestrator system prompt.

  • The old orchestrator is intentionally preserved on replacement failure, but reviewers should pay close attention to whether
    replacement startup still guarantees AO system-prompt delivery.

Closes #275

- Implemented AgentsController to handle /agents endpoint, returning a list of supported and installed agents.
- Created agent inventory service to manage agent data and detect installed agents.
- Updated ProjectSettingsForm to fetch and display agent information, including installed and supported agents.
- Enhanced error handling for agent detection and orchestrator restarts.
- Added tests for agent catalog and service to ensure correct functionality and error handling.
…flect changes

- Added `AuthStatus` method to various agent plugins to check authorization status using CLI probes.
- Introduced `authprobe` package to handle common CLI command checks for agent authorization.
- Updated backend tests to include scenarios for authorized and unauthorized agents.
- Modified frontend API schema to include `authorized` counts and `authStatus` for agents.
- Enhanced `ProjectSettingsForm` to display authorized agents and their statuses, including prompts for login when necessary.
- Adjusted agent selection logic to prioritize authorized agents and provide feedback for unauthorized or uninstalled agents.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@nikhilachale

Copy link
Copy Markdown
Author

Comment thread backend/internal/session_manager/manager.go
return nil
}
sort.Sort(sort.Reverse(sort.StringSlice(matches)))
return matches

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`nvmNodeBinCandidates` is missing its closing brace — the function body never terminates before `func resolveNativeWindowsCodex(...)` starts on the next line, which nests a named function declaration inside another function. This is not valid Go and the package will not compile as currently pushed:

```go
func nvmNodeBinCandidates(home, binary string) []string {
matches, err := filepath.Glob(filepath.Join(home, ".nvm", "versions", "node", "*", "bin", binary))
if err != nil || len(matches) == 0 {
return nil
}
sort.Sort(sort.Reverse(sort.StringSlice(matches)))
return matches
func resolveNativeWindowsCodex(path string) string { // <-- missing } above


Needs a `}` after `return matches` to close `nvmNodeBinCandidates` before `resolveNativeWindowsCodex` starts. Given `TestResolveCodexBinaryFindsNVMInstallWhenPathIsSparse` was added in this same PR, this file couldn't have been built/tested in its current state — worth double-checking the push matches what was actually tested locally.

Comment thread backend/internal/service/agent/service.go
Comment thread backend/internal/service/session/service.go
if err != nil {
m.logger.Warn("session manager: old orchestrator probe failed after runtime destroy",
"session", id, "err", err)
} else if alive {

@neversettle17-101 neversettle17-101 Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IsAlive still reports true after recordRetiredTermination already succeeded, the loop retries Destroy/recordRetiredTermination again, and on final failure returns ErrSessionStillAlive — but the DB already says terminated. Since List(Active: true) won't surface a terminated row, nothing will ever retry killing this runtime again: a zombie orchestrator can leak silently while the system believes it's fully retired. Don't re-record termination once it has already succeeded; surface a distinct "terminated but still alive" signal so recovery tooling can retry the kill.

@neversettle17-101

Copy link
Copy Markdown
Collaborator

Please resolve merge conflicts and test the flow end to end locally as well.

nikhilachale and others added 4 commits June 21, 2026 20:48
- Updated NewTaskDialog tests to increase timeout for async operations.
- Modified ProjectSettingsForm tests to improve agent handling and validation messages.
- Refactored ProjectSettingsForm component to streamline agent selection and validation logic.
- Introduced new agent service to manage agent inventory and authentication status.
- Improved Sidebar tests to ensure proper agent options are loaded and handled.
- Enhanced SessionsBoard component by removing unused imports and optimizing state management.
- Fixed Select component styling for better consistency in UI.
- Added error handling for AO daemon readiness in ShellLayout.
@nikhilachale

Copy link
Copy Markdown
Author

@neversettle17-101

  1. codex.go missing brace: fixed.
    There is now a } after return matches, so resolveNativeWindowsCodex is no longer nested.

  2. catalog.go naming nit: fixed by renaming the file.
    backend/internal/service/agent/catalog.go is deleted and the same package code now lives in
    backend/internal/service/agent/service.go, matching the Service type.

  3. SpawnOrchestrator(clean=true) race: fixed.
    backend/internal/service/session/service.go now has per-project orchestrator locks around the read-
    existing → spawn-new → retire-old sequence. Tests were added for same-project serialization and
    allowing different projects to proceed concurrently.

  4. Retired orchestrator zombie handling: fixed.
    backend/internal/session_manager/manager.go now returns ErrRetiredSessionStillAlive after
    termination is recorded but the runtime still appears alive, instead of retrying/re-recording and
    ending with the generic ErrSessionStillAlive. The session service maps that to
    ORCHESTRATOR_REPLACEMENT_RECOVERY_REQUIRED.

Screenshot 2026-06-22 at 12 35 55 Screenshot 2026-06-22 at 12 35 38 Screenshot 2026-06-22 at 12 35 33

@nikhilachale

Copy link
Copy Markdown
Author

other than that
@whoisasx has added CreateProjectAgentSheet and some changes in ui around worker and orchestrator agent so add some fixes over them

  • CreateProjectAgentSheet no longer uses hardcoded AGENT_OPTIONS.
    It now loads the shared /api/v1/agents catalog via React Query and shows supported agents with Needs
    install / Needs auth labels. Only authorized agents are selectable.

  • ProjectSettingsForm was refactored around the same agent catalog.
    It shows agent labels instead of raw IDs, supports manual Reload agents, warns for configured-but-
    missing or unauthorized agents, and shows an “Agent login needed” prompt when agents are installed
    but none are authorized.

  • Project settings validation was tightened.
    Existing projects missing worker/orchestrator role config now show Worker and orchestrator agents
    are required. and do not save until those are selected.

  • Sidebar project creation tests were updated.
    Tests now seed/mock the agent catalog and verify the project creation dialog handles async catalog
    loading.

  • NewTaskDialog.test.tsx timeout was increased to 10_000ms.
    That is just test stability for async typing/submission.

  • SessionsBoard.tsx cleanup.
    Removed duplicate/unused imports. No real behavior change there.

  • Select component styling changed.
    The popper viewport no longer forces h-[var(--radix-select-trigger-height)]; it uses trigger width
    without forcing dropdown height, so long agent option menus render correctly.

  • ShellLayout got daemon readiness handling.
    Before project creation it refreshes daemon status and errors clearly if the daemon is not ready. It
    also invalidates workspace queries when the daemon port changes.
    receives DefaultHarness: domain.AgentHarness(cfg.Agent) in projectsvc.Deps.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we keep the file name service_test according to other prs?

// then retire any older active orchestrators for that project so a failed
// replacement never causes downtime. This business rule belongs here, not in
// the HTTP controller.
func (s *Service) SpawnOrchestrator(ctx context.Context, projectID domain.ProjectID, clean bool) (domain.Session, error) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the new orchestrator is spawning what does the user experience look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants