Skip to content

fix(secrets): pin keychain access group + explicit data-protection routing (#279)#290

Draft
muqsitnawaz wants to merge 1 commit into
mainfrom
fix-279-keychain-dataprotection
Draft

fix(secrets): pin keychain access group + explicit data-protection routing (#279)#290
muqsitnawaz wants to merge 1 commit into
mainfrom
fix-279-keychain-dataprotection

Conversation

@muqsitnawaz

Copy link
Copy Markdown
Contributor

DRAFT — NEEDS VERIFICATION on an affected machine (reporter: Darwin 25.5.0).
The -25293 regression does not reproduce on the build this was developed on (macOS 26.4, build 25E246, Darwin 25.4.0 — writes succeed here), so the fix for the reporter's symptom could not be end-to-end verified. See "Verification" below for what was and wasn't proven, and the exact steps for the reporter.

Fixes #279 (pending verification).

Symptom

agents secrets create <name> / agents secrets add ... fail immediately on macOS with Failed to write keychain item (OSStatus -25293) (errSecAuthFailed). Returns instantly, no biometric prompt, persists across reinstalls.

Root-cause correction (with machine evidence)

The original debugging hypothesis was: the set path omits kSecUseDataProtectionKeychain, so it targets the legacy file-based login keychain, which macOS 26.4 broke (-25293); moving to the data-protection keychain dodges it.

I probed the actual SecItemAdd routing on macOS 26.4 (unsigned binary, so the data-protection keychain's entitlement check fires as a routing tell — -34018 is a data-protection-keychain-only error; the file-based keychain has no entitlement model and returns 0):

SecItemAdd attributes status keychain reached
plain (no ACL) 0 file-based
kSecAttrAccessible only 0 file-based
biometry kSecAttrAccessControl (what the original set used) -34018 data-protection
explicit kSecUseDataProtectionKeychain -34018 data-protection

So the original set was already on the data-protection keychain — the biometry SecAccessControl forces it there (the file-based keychain cannot store biometry-gated items). The codebase's own existing comments already said as much (list/has: "items written by set carry a biometry ACL, which forces them into the data-protection keychain"). Apple's TN3137 documents biometry as "only supported by the data protection keychain."

Therefore adding kSecUseDataProtectionKeychain explicitly does not change which keychain is used, and on its own most likely does not fix -25293.

What this PR actually changes

  1. Pin kSecAttrAccessGroup to a concrete group2HTP252L87.com.phnx-labs.agents-keychain (the com.apple.application-identifier from the embedded profile). The profile grants the entitlement as a wildcard only (2HTP252L87.*) with no concrete default group, so an add that omits the access group depends on the system's implicit default-group resolution. Pinning a concrete group removes that dependency. This is the most plausible real cause of -25293 and the change most likely to fix it.
  2. State kSecUseDataProtectionKeychain explicitly on every SecItem call (set, get, get-batch, delete, has, list, migrate-acl, migrateInline) so routing is deterministic and consistent across reads and writes (items written to one keychain are invisible to queries against the other).
  3. kSecAttrSynchronizable = false — items stay device-local, matching the existing ...ThisDeviceOnly intent.
  4. Forward migration (file-based → data-protection), no silent fallback. readItem queries the data-protection keychain first; on a clean miss (errSecItemNotFound only — never on an auth error) it reads the legacy file-based keychain once, and get/get-batch rewrite the value forward (delete legacy + re-add to DP with the biometry ACL + pinned group). Per-item and self-retiring: once migrated, the DP lookup hits and the fallback never fires again, so it is idempotent without a separate sentinel — the same pattern as the existing trusted-app-ACL JIT migration. set/delete clear both keychains so no stale legacy copy shadows the new item.

Signing/entitlements (scripts/build-keychain-helper.sh, scripts/keychain-entitlements.plist, embedded provisioning profile) are unchanged — they were already correct, which is why the bug is -25293 and not -34018 on the reporter's signed binary.

Apple references

  • kSecUseDataProtectionKeychain
  • TN3137 "On Mac keychain APIs and implementations" — companion forum thread (file-based keychain on the road to deprecation; biometry only supported by the data-protection keychain; data-protection keychain selected by kSecUseDataProtectionKeychain / kSecAttrSynchronizable)

Verification

Proven locally (macOS 26.4, 25E246):

  • Helper compiles clean for both arch slices (arm64-apple-macos12, x86_64-apple-macos12) and lipo into a universal binary.
  • Routing evidence above (the -34018 vs 0 table).
  • bun run build green; full test suite green (2373 passed, 51 skipped), incl. 13 source-level helper tests asserting DP routing, the pinned access group, dual-keychain set/delete, and the forward-migration logic.

NOT proven locally (the verification gap):

  • The -25293 symptom does not reproduce on this build, so I cannot confirm the fix resolves it.
  • The full signed write→read→delete cycle against the data-protection keychain could not be exercised: it requires the team keychain-access-groups entitlement, which is only honored at runtime with the embedded provisioning profile (bin/embedded.provisionprofile, generated at release time, not in the repo). A Developer-ID-signed local binary carrying that entitlement without the profile is SIGKILLed by AMFI (confirmed: exit 137).

Steps for the reporter (Darwin 25.5.0) to confirm:

  1. Build + sign + notarize the helper with the real credentials and embedded profile: APPLE_ID=… APPLE_APP_SPECIFIC_PASSWORD=… APPLE_TEAM_ID=2HTP252L87 ./scripts/build-keychain-helper.sh, then bun run build.
  2. agents secrets create probe → expect success (no -25293).
  3. agents secrets add probe TESTKEY --value hello then read it back → expect a single Touch ID prompt and the value.
  4. If pre-existing secrets were stored by an older helper: read one and confirm it migrates forward (subsequent reads resolve from the data-protection keychain via Touch ID).
  5. If -25293 persists after the access-group pin, the remaining suspect is the biometry SecAccessControl evaluated at write time in the headless helper process (the reporter's own hypothesis in agents secrets: keychain write fails with OSStatus -25293 (errSecAuthFailed) on macOS — helper is correctly signed/notarized #279) — a different fix touching how the ACL is applied or how the helper is launched.

🤖 Generated with Claude Code

…uting (#279)

DRAFT / NEEDS-VERIFICATION on an affected machine (reporter: Darwin 25.5.0).
The -25293 (errSecAuthFailed) regression does NOT reproduce on this build
(macOS 26.4, 25E246) so the fix could not be end-to-end verified here.

Issue #279: `agents secrets create/add` fails immediately on macOS with
OSStatus -25293 (errSecAuthFailed). The original `set` attached only a biometry
SecAccessControl with no kSecUseDataProtectionKeychain and no kSecAttrAccessGroup.

Empirical correction to the original hypothesis (probed on macOS 26.4):
adding a biometry SecAccessControl already routes the item to the
data-protection keychain (the file-based keychain cannot store biometry-gated
items) -- a plain add returns 0 here, a biometry-ACL add returns -34018, a
DP-keychain-only error. So the write was NOT on the file-based keychain; moving
it there explicitly does not, by itself, change which keychain is used.

What this change actually does:
- Pin kSecAttrAccessGroup to the concrete application-identifier group
  (2HTP252L87.com.phnx-labs.agents-keychain). The embedded profile grants the
  entitlement as a wildcard (2HTP252L87.*) with no concrete default group, so an
  add that omits the access group relies on implicit default-group resolution.
  Pinning removes that dependency -- the most likely real fix for -25293.
- State kSecUseDataProtectionKeychain explicitly on every SecItem call (set,
  get, get-batch, delete, has, list, migrate-acl, migrateInline) so routing is
  deterministic and consistent across read and write.
- Mark items kSecAttrSynchronizable=false (device-local), matching the existing
  ThisDeviceOnly intent.
- Forward migration: readItem queries the data-protection keychain first, then
  the legacy file-based keychain ONCE on a clean miss; get/get-batch then rewrite
  the value into the data-protection keychain (delete legacy + re-add). Per-item
  and self-retiring -- idempotent without a separate sentinel. delete/set clear
  both keychains so no stale legacy copy shadows the new item.

Signing/entitlements unchanged (already correct). Helper compiles clean for both
arch slices. The biometry-SecAccessControl-evaluated-headlessly path (the
reporter's own hypothesis) remains the alternative suspect if the access-group
pin does not resolve it on the affected machine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@muqsitnawaz

Copy link
Copy Markdown
Contributor Author

Independent review: SOUND — needs verification on an affected machine.

Verified locally:

  • All SecItem call sites routed. Enumerated every SecItemAdd/CopyMatching/Update/Delete (12–14 sites). Every canonical write goes to the data-protection keychain via dpBase() (keychain-helper.swift:75-84, kSecUseDataProtectionKeychain at :80); the only file-based calls are deliberate legacy reads / cleanup deletes — no missed site that would silently write to the wrong keychain. The test that counts kSecClass occurrences guards this against future hand-rolled dicts.
  • Access group pinned consistently from a single constant kAccessGroup (:52), kSecAttrSynchronizable=false (:82).
  • Migration is explicit + idempotent + non-lossy — DP-first read, file-based fallback only on clean errSecItemNotFound, namespace-gated to agents-cli.*, value held in memory before the legacy delete. The DP-hit is the per-item sentinel.
  • Swift compiles (swiftc -framework Security -framework LocalAuthentication -framework Foundation → exit 0; only pre-existing deprecation warnings). 13 unit tests pass.

CI note: the build (22) red is an unrelated flaky concurrency test (state.test.ts > does not lose concurrent updateMeta callback writes); build (24) + the keychain tests pass. This PR only touches keychain-helper.swift + its test.

Verification gap (why this stays a draft): the macOS 26.4 -25293 regression does not reproduce on this build (25E246, writes already succeed), so the fix itself can't be confirmed here. Needs a signed build on an affected machine (Darwin 25.5.0) to confirm (a) the embedded profile resolves the pinned access group at runtime (else SecItemAdderrSecMissingEntitlement -34018), and (b) DP routing clears -25293. Code is safe to test+ship there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

agents secrets: keychain write fails with OSStatus -25293 (errSecAuthFailed) on macOS — helper is correctly signed/notarized

1 participant