Skip to content

fix(tokens): issueTokenPair signs via signToken dispatcher (restores /v1/tokens in prod)#33

Closed
kjgbot wants to merge 1 commit intomainfrom
fix/issue-token-pair-signs-rs256
Closed

fix(tokens): issueTokenPair signs via signToken dispatcher (restores /v1/tokens in prod)#33
kjgbot wants to merge 1 commit intomainfrom
fix/issue-token-pair-signs-rs256

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented Apr 24, 2026

Urgent — fixes production 500 on POST /v1/tokens

Sage's Slack handler (and every other RelayAuth api-key caller that mints delegated tokens) is currently failing with a generic "I ran into an issue" fallback. Root cause traced via wrangler tail on sage + observability logs on relayauth:

Failed to process Slack event
Error: mintRelayfileToken: RelayAuth request failed (500): Internal Server Error

Relayauth worker log:

DataError: Imported HMAC key length (0) must be a non-zero value up to 7 bits
less than, and no greater than, the bit length of the raw key data (0).

Root cause

cloud#299 retired the SIGNING_KEY worker binding (phase 122 step 3). Everything else in the HS256 sunset worked: JWKS route was fixed (#32), relayauth's JWKS now only shows RSA in production. But issueTokenPair in routes/tokens.ts was still calling signHs256Jwt(claims, env.SIGNING_KEY, env.SIGNING_KEY_ID) directly, bypassing the signToken(claims, env) dispatcher we added in phase 120 that respects RELAYAUTH_SIGNING_ALG.

With SIGNING_KEY undefined, crypto.subtle.importKey threw DataError → 500 on every POST /v1/tokens.

Fix

  • Replace two direct signHs256Jwt(...) calls with signToken(claims, env). The dispatcher routes to RS256 signing via RELAYAUTH_SIGNING_KEY_PEM (still bound) since RELAYAUTH_SIGNING_ALG=RS256 (set in cloud/infra/relayauth.ts:55).
  • Delete the now-unused signHs256Jwt helper.

No behavior change for working paths

Issued access + refresh tokens are now RS256-signed via the same key published to JWKS. Verifiers have been accepting RS256 since phase 121 dual-verify. The only things that change: /v1/tokens stops 500'ing, and issueTokenPair-produced tokens now carry RS256 + the kid published in JWKS instead of the legacy HS256 kid.

Tests

  • 33/33 in tokens-route.test.ts pass (assertions check response shape + status codes; RS256 signing is invisible to them).
  • tsc --noEmit clean on packages/server.

Deploy ordering

  1. Merge this PR
  2. Publish @relayauth/server@0.2.5 (you — only you can publish)
  3. I'll prep a cloud PR bumping @relayauth/* to ^0.2.5
  4. Merge cloud bump → deploy → /v1/tokens returns 201 with an RS256 token again

Rollback

Trivial — revert this PR brings back signHs256Jwt. But rollback doesn't unbreak production (the SIGNING_KEY binding is still gone from the worker), so fixing forward is the only path.

🤖 Generated with Claude Code

…prod)

Regression root cause:
  cloud#299 retired the SIGNING_KEY worker binding on the relayauth
  worker as part of HS256 sunset (phase 122 step 3). The JWKS bugfix
  (#32) and the SIGNING_KEY-gate fix in the JWKS route were shipped.
  But issueTokenPair in routes/tokens.ts was still calling
  signHs256Jwt(claims, env.SIGNING_KEY, env.SIGNING_KEY_ID) directly —
  bypassing the signToken(claims, env) dispatcher added in phase 120.

  With SIGNING_KEY undefined at runtime, signHs256Jwt's
  crypto.subtle.importKey call threw "DataError: Imported HMAC key
  length (0) must be non-zero", producing a 500 on every POST /v1/tokens.
  Sage (and every other api-key caller) hit this 500 as soon as they
  tried to mint a delegated token, cascading into production failures:
  "I ran into an issue processing your request" in the Slack path was
  the user-facing manifestation.

Fix
  - Replace the two direct signHs256Jwt calls with signToken(claims, env).
    signToken dispatches on RELAYAUTH_SIGNING_ALG (RS256 in production
    per infra/relayauth.ts:55) and signs via RELAYAUTH_SIGNING_KEY_PEM,
    which is still bound.
  - Delete the now-unused signHs256Jwt helper. (encodeBase64UrlJson +
    encodeBase64UrlBytes stay; other code paths use them.)

No behavior change for any caller that was working before #299. The
issued access + refresh tokens are now RS256-signed (they already
were, via the JWKS published key, for OTHER code paths — but NOT for
/v1/tokens, until now).

Tests
  - tokens-route.test.ts: 33/33 pass (no changes to test fixtures
    needed — RS256 signing is invisible to the assertions which only
    check response shape / status codes).
  - tsc --noEmit clean on packages/server.

Deploy
  - Publish @relayauth/server@0.2.5
  - Bump cloud's @relayauth/* dep to ^0.2.5
  - Cloud deploy restores /v1/tokens production flow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kjgbot
Copy link
Copy Markdown
Contributor Author

kjgbot commented Apr 24, 2026

Superseded by the full HS256 purge PR (in progress by RelayauthHs256Purge agent on branch fix/purge-hs256-from-relayauth). Production restore path is now 'merge the purge PR + publish + cloud bump + deploy' rather than this minimal fix.

@kjgbot kjgbot closed this Apr 24, 2026
@khaliqgant khaliqgant deleted the fix/issue-token-pair-signs-rs256 branch April 24, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant