Skip to content

fix(isolation): mount git identity into docker containers; cross-check premature docker status (#1939)#1940

Merged
konard merged 8 commits into
mainfrom
issue-1939-f81a3d54f708
Jun 17, 2026
Merged

fix(isolation): mount git identity into docker containers; cross-check premature docker status (#1939)#1940
konard merged 8 commits into
mainfrom
issue-1939-f81a3d54f708

Conversation

@konard

@konard konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1939 (continuation of #1860). A solve task launched with
--isolation docker inside a Docker-in-Docker host
(konard/hive-mind-dind:2.0.2) failed and surfaced three distinct problems.
This PR fixes the ones that live in Hive Mind code, adds debug/verbose
diagnostics so the rest is observable next iteration, reports the upstream
start-command bug, and ships a full case study.

Root cause (Problem 3 — definite)

The native start-command ($) docker invocation mounted only ~/.config/gh,
~/.claude, and ~/.claude.jsonno git identity. Inside the container
gh was fully authenticated (✓ Logged in to github.com account konard) yet
git config user.name/user.email were unset, so solve aborted at the
system-check stage with ❌ Git identity not configured. "Credentials not
mounted" in the title is exactly this: the git identity credential never
reached the container.

[VERBOSE] isolation-runner: Docker isolation mounts: /home/box/.config/gh, /home/box/.claude, /home/box/.claude.json
...
❌ Git identity not configured
   user.name:  (not set)
   user.email: (not set)

What changed

  • Mount the git identity for every tool (getDockerIsolationAuthMounts,
    src/isolation-runner.lib.mjs): ~/.gitconfig and the XDG ~/.config/git
    are now mounted alongside gh, honoring GIT_CONFIG_GLOBAL/XDG_CONFIG_HOME.
    Missing host paths are skipped. Because this lives in the shared mount builder,
    every isolation caller inherits the fix.
  • Self-healing host preflight (ensureHostGitIdentityForIsolation, new):
    when the bot host has no mountable git identity it derives one from the
    authenticated gh account (gh-setup-git-identity / repairGitIdentity),
    else emits one actionable warning naming the exact downstream failure. Wired
    into the telegram-bot docker-isolation preflight; best-effort, never throws.
  • Pin start-command@0.29.1 (Dockerfile, Dockerfile.dind): the upstream
    premature-status bug (Problem 1) reported as
    link-foundation/start#136
    was fixed and released in start-command 0.29.1 (issue closed as completed
    2026-06-17). Both Hive Mind images now pin that version so the fixed $
    binary ships, instead of installing start-command unpinned.
  • Cross-check premature docker status (Problem 1, defense-in-depth):
    isUnknownDockerExitCode plus a docker-only cross-check in isSessionRunning
    and getIsolationSessionState keep an ambiguous executed/-1 docker session
    "running" until docker inspect confirms the container has actually exited.
    Retained alongside the upstream fix so an older $ on an operator's PATH
    cannot resurrect the bug. Real exit codes and non-docker backends are
    unaffected.
  • Verbose post-launch diagnostics (Problems 1 & 2, R6):
    logDockerIsolationPostLaunchDiagnostics records $ --status, live container
    state, and local image presence after a docker launch, so the premature status
    and any image re-pull are observable from data next iteration.
  • Upstream report (R7) — resolved: filed
    link-foundation/start#136
    for the detached docker session that reports a terminal status (executed)
    with exitCode -1 while its container is still running — with a reproducer,
    the consumer-side workaround, and a code-level fix suggestion. The maintainer
    fixed it in start-command 0.29.1 (issue closed as completed), which this PR
    now pins in both images.

How to reproduce

Run a solve task with --isolation docker on a host that has gh auth but no
~/.gitconfig (user.name/user.email unset). The isolated solve aborts
with ❌ Git identity not configured even though gh is authenticated. The full
captured terminal transcript is in
docs/case-studies/issue-1939/raw/failed-session-terminal.log.

Tests

  • New: tests/test-issue-1939-docker-isolation.mjs (25 assertions) — git
    identity mounted for claude+codex, env overrides, missing-identity skipped,
    ensureHostGitIdentityForIsolation (present / self-heal / unrepairable),
    isUnknownDockerExitCode, and the getIsolationSessionState cross-check
    (ambiguous-but-alive stays running, ambiguous-and-gone completes, real exit
    code trusted, screen backend unaffected).
  • tests/test-issue-1860-docker-isolation.mjs (33 assertions) still passes.
  • Full default suite: all 260 test files passed.
  • npm run lint clean; prettier-formatted.

Case study

docs/case-studies/issue-1939/README.md
— reconstructed timeline, all requirements, root cause per problem pinned to log
evidence, solution plans, alternatives considered, existing-library check, and
the upstream follow-up.

Fixes #1939

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #1939
@konard konard self-assigned this Jun 17, 2026
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

It actually was able to create the comment - #1596 (comment), so may be we did correctly mount it.

…k premature docker status (#1939)

A `--isolation docker` solve run failed at the system-check stage with
"Git identity not configured" even though gh was authenticated (account
konard). The native start-command (`$`) invocation mounted only ~/.config/gh,
~/.claude and ~/.claude.json — no git identity — so git user.name/user.email
were unset inside the container and solve aborted before any work.

Fixes (all in this PR on branch issue-1939-f81a3d54f708):

- getDockerIsolationAuthMounts now mounts the host git identity (~/.gitconfig
  and XDG ~/.config/git, honoring GIT_CONFIG_GLOBAL/XDG_CONFIG_HOME) for every
  tool, alongside gh. Missing host paths are skipped. (problem 3, root cause)
- ensureHostGitIdentityForIsolation (new) self-heals the bot host: when no git
  identity is mountable it derives one from the authenticated gh account via
  gh-setup-git-identity, else emits one actionable warning. Wired into the
  telegram-bot docker-isolation preflight, best-effort, never throws.
- isUnknownDockerExitCode + a docker-only cross-check in isSessionRunning and
  getIsolationSessionState keep an ambiguous "executed"/-1 docker session
  running until docker inspect confirms the container exited, so a still-running
  container is no longer reported as a finished/failed execution. (problem 1)
- logDockerIsolationPostLaunchDiagnostics (verbose) records $ --status, live
  container state and local image presence after launch so the premature status
  (problem 1) and image re-pull (problem 2) are observable next iteration.

Adds tests/test-issue-1939-docker-isolation.mjs (25 assertions) and a full case
study under docs/case-studies/issue-1939 with timeline, root-cause analysis and
captured logs. The premature-terminal-status behaviour is also reported upstream
to link-foundation/start.
@konard konard changed the title [WIP] --isolation docker is not working fix(isolation): mount git identity into docker containers; cross-check premature docker status (#1939) Jun 17, 2026
@konard konard marked this pull request as ready for review June 17, 2026 07:29
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $9.960528
  • Calculated by Anthropic: $11.007127
  • Difference: $1.046599 (+10.51%)

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

  1. 112.6K / 1M (11%) input tokens, 29.3K / 128K (23%) output tokens
  2. 116.0K / 1M (12%) input tokens, 36.9K / 128K (29%) output tokens
  3. 88.1K / 1M (9%) input tokens, 27.4K / 128K (21%) output tokens

Total: (20.3K new + 283.4K cache writes + 10.7M cache reads) input tokens, 108.6K output tokens, $9.960528 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (5669KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🔄 Auto-restart 1/5

Detected uncommitted changes from previous run. Starting new session to review and commit or discard them.

Uncommitted files:

?? bun.lock

Auto-restart will stop after changes are committed or discarded, or after 4 more iterations. Please wait until working session will end and give your feedback.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

All 25 tests pass, the branch is in sync with main, and the working tree is clean.

Summary

The auto-restart was triggered by a single uncommitted file: bun.lock. I investigated and determined it was not part of the solution:

  • The repo tracks package-lock.json (npm) as its lockfile — bun.lock is untracked.
  • bun.lock is a generated artifact from a local bun install, unrelated to issue --isolation docker is not working #1939's docker-isolation fix.
  • Nothing in the PR's changeset (src/isolation-runner.lib.mjs, src/session-monitor.lib.mjs, src/telegram-bot.mjs, tests, case study docs) references it.

I removed it with rm -f bun.lock, leaving the working tree clean (git status --porcelain returns 0 lines).

I also verified the underlying PR #1940 is healthy:

  • Branch issue-1939-f81a3d54f708 is 0 commits behind origin/main.
  • The issue's test suite passes: 25 passed, 0 failed.

No commit/push was needed — removing an untracked artifact leaves no tracked changes, and the PR's actual fix was already committed and pushed in the prior commits (40fbf3da, 19324303).

PR: #1940


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🔄 Auto-restart 1/5 Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $0.312151
  • Calculated by Anthropic: $11.408558
  • Difference: $11.096407 (+3554.82%)

📊 Context and tokens usage:

  • 36.5K / 1M (4%) input tokens, 2.7K / 128K (2%) output tokens

Total: (3.0K new + 23.8K cache writes + 159.5K cache reads) input tokens, 2.7K output tokens, $0.312151 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (6072KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Double check everything, and also check that #1940 (comment) was addressed.

@konard konard marked this pull request as draft June 17, 2026 07:52
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-17T07:52:55.251Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard added 2 commits June 17, 2026 07:55
…ted (#1939)

Address PR review feedback: @konard observed the isolated solve was able
to post its failure comment to issue #1596, suggesting gh credentials
reached the container. Confirm and document this — the posted comment is
positive proof the .config/gh mount worked, isolating the defect to the
missing git identity, which is exactly what the fix adds.
…inal.log (#1939)

The case study README and PR description both reference
raw/failed-session-terminal.log as primary evidence, but the global '*.log'
gitignore rule silently dropped it, so every 'log line N' citation pointed at a
missing file. Negate the ignore for docs/case-studies/**/*.log and commit the
194-line operator terminal transcript extracted from the issue body.
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Re: #1940 (comment) — addressed ✅

You're right, and that observation actually confirms the diagnosis rather than contradicting it:

  • The isolated container did post #1596 (comment) — a gh api write that only an authenticated gh inside the container could perform. So the .config/gh mount was correct; GitHub auth crossed the boundary.
  • What did not cross is the git identity (user.name/user.email). Those live in ~/.gitconfig, which was never in the mount list, so solve's system check aborted with ❌ Git identity not configured before any commit. gh auth is not a git identity.

That's why this PR adds a ~/.gitconfig mount and leaves the already-working gh mount untouched. I made this explicit in the case study (Root Cause 1 now cites the posted #1596 comment as positive proof the gh mount worked).

Double-check results

  • Comment addressed — documented above and in docs/case-studies/issue-1939/README.md.
  • Missing evidence file — the README and this PR description both referenced raw/failed-session-terminal.log, but a global *.log .gitignore rule had silently dropped it, so every cited "log line N" pointed at a missing file. Negated the ignore for docs/case-studies/**/*.log and committed the 194-line transcript; spot-checked that the cited line numbers (5–10 mounts, 118 solve-start, 145 git-identity fail, 176 gh-authed) all resolve correctly.
  • Teststests/test-issue-1939-docker-isolation.mjs: 25/25 pass. Related suites green too: test-isolation-runner, test-issue-1860 (33), test-issue-1914 (20), test-issue-1680 (14).
  • Branch — 0 commits behind origin/main; working tree clean.
  • CI — no failures; changeset present for the release trigger.

@konard konard marked this pull request as ready for review June 17, 2026 07:58
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $1.939643
  • Calculated by Anthropic: $2.193005
  • Difference: $0.253361 (+13.06%)

📊 Context and tokens usage:

  • 80.2K / 1M (8%) input tokens, 15.1K / 128K (12%) output tokens

Total: (6.2K new + 67.6K cache writes + 2.2M cache reads) input tokens, 15.1K output tokens, $1.939643 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (1324KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Use latest version https://github.com/link-foundation/start, and double check our solution fully implements fix for #1939

@konard konard marked this pull request as draft June 17, 2026 12:47
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-17T12:47:21.932Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

…docker status) (#1939)

The detached-docker premature-status bug reported upstream as
link-foundation/start#136 was fixed and released in start-command 0.29.1
(issue closed as completed 2026-06-17). Pin start-command@0.29.1 in both
Dockerfile and Dockerfile.dind so the fixed $ binary ships in the images.

The downstream isUnknownDockerExitCode cross-check is retained as
defense-in-depth for hosts still running an older $. Refresh the case study
and npm metadata to record the upstream resolution and the version pin.
@konard konard marked this pull request as ready for review June 17, 2026 12:54
@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Re: #1940 (comment) — use latest start-command + full double-check ✅

1. Using the latest link-foundation/start

Good timing: the upstream bug I reported — link-foundation/start#136 (detached docker session reporting a terminal executed/-1 status while the container is still running, i.e. Problem 1) — was fixed and released in start-command@0.29.1 (issue closed as completed 2026-06-17T09:16Z, npm published 09:18Z). Release notes confirm the exact fix:

Fix detached docker --status/--list reporting a terminal status (executed) with the -1 sentinel while the container is still running … isDetachedSessionAlive() now treats a failed docker inspect as "unknown" … enrichDetachedStatus() resolves the real exit code … only falling back to -1 when no real code can be obtained.

Both images installed start-command unpinned (bun install -g start-command), so a rebuild could silently pick up an older or drifting $. I now pin start-command@0.29.1 in Dockerfile and Dockerfile.dind with a comment linking #136/#1939, so the fixed $ binary is guaranteed in the images. The downstream isUnknownDockerExitCode cross-check is kept as defense-in-depth for operators still running an older $ on PATH.

2. Double-check — every requirement of #1939

Req Problem Status
R1 Premature executed/-1 status, unfollowable live log ✅ Fixed upstream in 0.29.1 (now pinned) + downstream cross-check (isSessionRunning / getIsolationSessionState) as defense-in-depth
R3 "Credentials not mounted" → actually ❌ Git identity not configured ~/.gitconfig + XDG ~/.config/git mounted for every tool; ensureHostGitIdentityForIsolation self-heals from the authed gh account. (gh auth itself was already crossing the boundary — proven by the posted #1596 comment)
R2 hive-mind image re-downloaded inside DinD ◾ Environmental/deployment (host-passthrough, tracked in #1914); detected & reported loudly via preflight + new post-launch diagnostic
R4 Fix everywhere ✅ Shared getDockerIsolationAuthMounts → all isolation callers inherit it
R5 Case study docs/case-studies/issue-1939/ (timeline, root causes, raw logs) — updated to record the upstream resolution
R6 Debug/verbose for next iteration logDockerIsolationPostLaunchDiagnostics
R7 Report upstream #136 filed and now fixed in 0.29.1
R8 Single PR on this branch

Verification

  • tests/test-issue-1939-docker-isolation.mjs: 25/25 pass. Related green too: test-isolation-runner, test-issue-1860 (33), test-session-monitor-isolation (8).
  • npm run lint clean; prettier-formatted.
  • Branch issue-1939-f81a3d54f708: 0 commits behind origin/main, working tree clean.

Latest commit: a700dbd9.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $4.736062
  • Calculated by Anthropic: $5.118247
  • Difference: $0.382185 (+8.07%)

📊 Context and tokens usage:

  • 114.6K / 1M (11%) input tokens, 29.3K / 128K (23%) output tokens

Total: (5.4K new + 101.9K cache writes + 6.7M cache reads) input tokens, 29.3K output tokens, $4.736062 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2591KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit 3a0a3a7 into main Jun 17, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--isolation docker is not working

1 participant