Skip to content

feat: credential refresh trait for short-lived provider tokens (Vertex AI first implementation) #472

@xek

Description

@xek

Problem Statement

Several inference providers use short-lived credentials that require periodic refresh: Google Vertex AI and Azure OpenAI with Entra ID use OAuth2 tokens (1-hour expiry), IBM watsonx uses IBM Cloud IAM tokens. OpenShell has no mechanism for credential lifecycle management — provider credentials are stored once at creation and never updated. Attempting to use these providers today fails silently after token expiry, with no workaround that doesn't require manual intervention.

This is not a Vertex-specific gap. Any provider that issues short-lived credentials hits the same missing primitive.

Proposed Design

Introduce a lightweight CredentialRefresher trait in the gateway, with Google Vertex AI as the first implementation. AWS Bedrock is explicitly out of scope — SigV4 is per-request signing, not a stored token, and requires a different approach.

1. CredentialRefresher trait (crates/openshell-server/src/providers/refresh.rs):

#[async_trait]
pub trait CredentialRefresher: Send + Sync {
    fn provider_type(&self) -> &'static str;
    fn needs_refresh(&self, credentials: &HashMap<String, String>) -> bool;
    async fn refresh(
        &self,
        credentials: &HashMap<String, String>,
    ) -> Result<HashMap<String, String>, RefreshError>;
}

2. Gateway refresh loop — a single background task iterates all provider records, calls needs_refresh() for each, and calls refresh() for those that do. Follows the existing SSH session reaper pattern. Runs on a configurable interval (default: 10 minutes), well within the 1-hour expiry window of OAuth2 tokens.

3. Vertex implementationVertexRefresher stores a service account JSON key, calls Google's OAuth2 token endpoint, and returns a fresh Bearer token. The token expiry timestamp is stored alongside the credential so needs_refresh() can check it without making a network call.

4. Delivery via existing bundle poll — no sandbox-side changes. The sandbox polls GetInferenceBundle every 30 seconds. When a token is refreshed, the bundle revision hash changes, the sandbox detects it on the next poll, and atomically updates the Arc<RwLock<Vec<ResolvedRoute>>> route cache. No router restart, no dropped connections.

5. New vertex provider profile (crates/openshell-core/src/inference.rs):

static VERTEX_PROFILE: InferenceProviderProfile = InferenceProviderProfile {
    provider_type: "vertex",
    default_base_url: "https://{region}-aiplatform.googleapis.com/v1",
    protocols: ANTHROPIC_PROTOCOLS,
    credential_key_names: &["GOOGLE_SERVICE_ACCOUNT_JSON"],
    base_url_config_keys: &["VERTEX_BASE_URL"],
    auth: AuthHeader::Bearer,
    default_headers: &[("anthropic-version", "2023-06-01")],
};

The agent connects to inference.local as normal. The router strips the agent's auth header and injects the current access token. The service account JSON key never leaves the gateway.

User-facing interface — no new CLI commands:

openshell provider add --type vertex \
  --name my-vertex \
  --credential GOOGLE_SERVICE_ACCOUNT_JSON="$(cat sa.json)" \
  --config VERTEX_BASE_URL=https://us-east1-aiplatform.googleapis.com/v1/...
openshell inference set --provider my-vertex --model claude-3-7-sonnet

Future providers (Azure Entra ID, IBM watsonx) implement the same trait without touching the refresh loop or delivery mechanism.

Alternatives Considered

  • Hardcode Vertex-specific logic in the gateway: duplicated for every subsequent provider with short-lived credentials. Rejected in favour of the trait.
  • Client-side refresh in provider plugin: plugins only run at discovery time on the user's machine, no server-side lifecycle.
  • Sidecar token refresher: no existing infrastructure, adds a new component for one provider's auth quirk.
  • Static token stored at provider creation: expires after 1 hour, fails silently mid-session.
  • ANTHROPIC_BASE_URL override: auth header mismatch (x-api-key vs Bearer), broken on first request.
  • AWS Bedrock via this design: SigV4 is per-request signing, incompatible with a stored-token model. Separate issue.

Agent Investigation

  • crates/openshell-core/src/inference.rs: three hardcoded profiles (openai, anthropic, nvidia); profile_for() returns None for all other types.
  • crates/openshell-providers/src/providers/generic.rs: placeholder only, discover_existing() always returns None. No refresh capability.
  • crates/openshell-sandbox/src/lib.rs (spawn_route_refresh): polls GetInferenceBundle every 30s, uses revision hash for change detection, atomically replaces route cache — no sandbox-side changes needed.
  • Gateway background task pattern established by SSH session reaper in crates/openshell-server/. Refresh loop follows the same structure.
  • No existing Vertex, Google, Gemini, Azure, or watsonx references anywhere in the codebase.
  • No open issues covering this gap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions