Skip to content

feat: background git fetch for connected repos (auto-refresh v1)#213

Merged
vakovalskii merged 1 commit into
mainfrom
feat/repo-auto-refresh
May 15, 2026
Merged

feat: background git fetch for connected repos (auto-refresh v1)#213
vakovalskii merged 1 commit into
mainfrom
feat/repo-auto-refresh

Conversation

@NovakPAai
Copy link
Copy Markdown
Collaborator

Summary

Adds a background `git fetch --all --prune` worker for connected repositories so a new chat session starts on fresh `origin/` refs. Three triggers (manual button, new-chat click, service start), per-repo single-flight + semaphore max 4, 60s timeout with SIGTERM→SIGKILL grace, atomic settings persistence.

The working tree is never touched — the user decides when to merge.

Artefacts

  • SDD: `docs/design/repo-auto-refresh.md`
  • BDD: `specs/repo-auto-refresh.feature` (22 Gherkin scenarios)
  • Plan: `tasks/2026-05-12-repo-auto-refresh/plan.md`
  • Architecture: `docs/ARCHITECTURE.md` § Repo Auto-Refresh
  • Endpoint table: `docs/ARCHITECTURE.md` § API Routes → Repo Auto-Refresh

What ships

Backend (`src/repo-refresh.js`, `src/repo-refresh-routes.js`, `src/atomic.js`)

  • Singleton manager: state map per gitRoot, single-flight via `inflight` Map, semaphore max 4, 60s timeout (SIGTERM → 2s grace → SIGKILL).
  • 4 routes: `GET /state`, `POST /trigger`, `POST /wait`, `GET|POST /settings`. Body cap 1 MiB, `timeoutMs` clamped to 10s, `asyncHandler` sends 500 on uncaught throws.
  • Atomic JSON writes (tmp + fsync + rename) — settings at `~/.codedash/refresh-settings.json` with mode 0o600. Existing disk caches in `data.js` retrofitted to the same helper (closes the deferred MEDIUM from fix: collapse worktrees to main repo, ignore $HOME-as-git-root #212).
  • Credential redaction strips `https://user:token@host\` from any captured stderr before `lastError` is exposed via `/state`.
  • Known-roots gate: settings-file paths are cross-checked against `loadProjects() ∪ session git_roots` (5s TTL) on startup. Settings-file injection cannot drive arbitrary fetches.

Frontend (`src/frontend/app.js`, `src/frontend/styles.css`)

  • Per-project card: status badge (Fetching / Updated 2 min ago / Refresh failed), 28×28 refresh button with 44×44 hit area, Auto-fetch toggle.
  • Header global toggle "Fetch all on codbash start".
  • Polling 2s only while a visible repo is fetching; time-ticker 30s re-renders relative timestamps. Both skip swap if focus is inside the slot (no focus theft).
  • Optimistic toggle with rollback + 3s deduplicated toasts. Input disabled during inflight POST.
  • New-chat hook shows a spinner on the launch button + `aria-busy`; re-entrancy guard via `dataset.rrLaunchInflight`.
  • A11y: native checkbox (no `role="switch"`), `role="group"` wrapper, `role="status"` + `aria-live="polite"` on the outer badge slot, `aria-describedby` → visually-hidden span with full error text. `prefers-reduced-motion` disables spinner animation.

Tests

34 new tests:

  • `test/atomic.test.js` (5) — write semantics, parent-dir creation, .tmp cleanup, EACCES, no-partial-write on rename failure
  • `test/repo-refresh.test.js` (16) — state transitions, single-flight, semaphore=4, 60s timeout SIGTERM→SIGKILL, stderr truncation + credential redaction, corrupt settings, debounced save, paths-with-spaces argv, initOnStartup (with/without known-roots gate, orphan GC, setKnownGitRootsProvider), waitForRefreshOrTimeout
  • `test/repo-refresh-api.test.js` (13) — all 4 endpoints + 404/400 validation paths (incl. malformed JSON, unknown gitRoot, /wait now also validates known set)

All 34 pass. Pre-existing `test/wsl-windows.test.js` failure on macOS is unrelated (#212).

Reviews

Two parallel review passes (code-reviewer, security-reviewer, UX/UI-reviewer with `vercel-web-design-guidelines` + `ux-designer` skills loaded). All CRITICAL/HIGH/MEDIUM findings resolved in-PR. Selected LOWs deferred with documented reason (see commit body).

Deferred to follow-ups

  • Retrofit pre-existing `readBody` to enforce body size cap across 16 callers (chore PR).
  • Periodic scheduler (5/10/15/30/60 min intervals).
  • Page-refresh bulk trigger from the browser.
  • "Behind by N commits" indicator.
  • Connection-lost banner when polling fails.
  • Toast queue/stack (current dedupe is sufficient for v1).

Test plan

  • `node --test test/atomic.test.js test/repo-refresh.test.js test/repo-refresh-api.test.js` → 34/34 green
  • Manual: open Projects view, click ↻ on a project — badge transitions Fetching → Updated.
  • Manual: toggle Auto-fetch on a project, click ▶ New — launch button shows "Fetching…", then opens session.
  • Manual: disconnect network, click ↻ — badge shows truncated error + tooltip + retry.
  • Manual: VoiceOver/screen reader — focus the badge during a fetch → hears "Fetching "; success → "Updated".
  • Manual: `prefers-reduced-motion` enabled in OS → spinner is a static dim circle.
  • Manual: corrupt `~/.codedash/refresh-settings.json`, restart service → starts with defaults, warning logged, file untouched.
  • curl `/api/repo-refresh/state` while a fetch is mid-flight → responds <100ms (event loop not blocked).

@NovakPAai
Copy link
Copy Markdown
Collaborator Author

Merge sequence — DO NOT MERGE BEFORE #212

Blocked on #212 (fix/git-root-worktree-and-home).

Why

This PR's RepoRefreshManager.setKnownGitRootsProvider() and initOnStartup() security gate rely on the corrected resolveGitRoot shipped in #212 — without it:

  • Bare repos and $HOME-as-git-root would appear in the "known projects" set
  • Linked worktrees would split into multiple project entries
  • macOS /var/private/var symlinks would cause duplicate keys

The orphan-GC logic in initOnStartup calls resolveGitRoot(key) !== '' to detect dead entries; the legacy version of that function would return false positives on bare repos, deleting valid settings.

Conflict status

CONFLICTING with current main — 3 files conflict with #211 ("restructure Projects tab — launcher landing + History subtab + agent picker"):

Plan after #212 merges

  1. git fetch origin main && git rebase origin/main
  2. Resolve conflicts:
    • server.js — re-apply repo-refresh route dispatch + getKnownGitRoots() helper next to feat: Projects tab → Projects (launcher) + History subtabs + agent picker #211's new routes
    • styles.css — repo-refresh styles append to end of file (no real overlap)
    • app.js — re-position renderRepoRefreshControls() call inside the new launcher-landing layout; preserve maybeRefreshBeforeLaunch() wraps on launchNewProjectSession and resumeLastProjectSession
  3. Re-run node --test test/atomic.test.js test/repo-refresh.test.js test/repo-refresh-api.test.js — expect 34/34
  4. Manual browser smoke on the new Projects tab structure (Auto-fetch toggle still discoverable, refresh button keyboard-reachable, badge accessible to AT)
  5. git push --force-with-lease
  6. After CI + approval — gh pr merge --squash

@vakovalskii
Copy link
Copy Markdown
Owner

Briefly scanned src/repo-refresh.js and src/atomic.js — same high-quality engineering as #210/#211 (dependency injection for testability, credential redaction in stderr, known-roots gate against settings-file injection, SIGTERM→SIGKILL grace). 👏

Currently CONFLICTING though. Could you rebase onto main (now has #210 + #212 landed) and force-push? Then I'll do a proper review pass.

Also still waiting on the rebase for #211 — feel free to do them in either order or both at once.

git fetch origin
git rebase origin/main
git push --force-with-lease

Adds per-project "Auto-fetch on new chat" toggle, a manual refresh
button on each project card, and a global "Fetch all on codbash start"
toggle. Runs `git fetch --all --prune` in the background — never touches
the working tree, never blocks the HTTP server.

Goal: when a session starts, the LLM sees current origin/<branch> so
new branches start from a fresh base and continuations don't pile on
top of stale state.

Triggers:
- Manual: click the refresh button on a project card.
- New chat: when the per-project toggle is on, /trigger + /wait
  (timeoutMs: 2000) fire before /api/launch. Session opens after fetch
  finishes, or after 2s with stale refs (graceful degradation).
- Service start: bin/cli.js wires the known-roots gate then calls
  repoRefreshManager.initOnStartup() on process.nextTick.

Backend:
- src/repo-refresh.js — singleton via createRepoRefreshManager(opts);
  state machine idle <-> fetching <-> error, single-flight per gitRoot,
  semaphore max 4, 60s timeout with SIGTERM -> 2s grace -> SIGKILL.
- src/repo-refresh-routes.js — 4 routes under /api/repo-refresh/*
  (state, trigger, wait, settings). Body cap 1 MiB, /wait timeoutMs
  clamped to 10s, asyncHandler sends 500 on uncaught throws.
- src/atomic.js — atomicWriteJson(path, obj, {mode}) — tmp + fsync +
  rename with cleanup on rename failure. Settings written 0o600;
  existing disk caches in data.js retrofitted (closes deferred MEDIUM
  from PR #212 about non-atomic writes).
- Persistence: ~/.codedash/refresh-settings.json. Corrupt file ->
  defaults + warning, file left untouched. Debounced 500ms saves.
- Known-roots gate: getKnownGitRoots() = loadProjects() U session
  git_roots, 5s TTL cache. Wired into manager via
  setKnownGitRootsProvider(); initOnStartup refuses to fetch paths
  not in the known set. Defends against settings-file injection.
- Credential redaction: https://user:token@host stripped from any
  captured stderr before storing in lastError (which is exposed via
  /state to the browser).

Frontend:
- Per-project card: status badge (Fetching / Updated 2 min ago /
  Refresh failed: <msg>), refresh button (28x28 visual, 44x44 hit
  area via ::after), Auto-fetch toggle. Header carries global
  "Fetch all on codbash start" toggle.
- Polling 2s only while a visible repo is fetching; time-ticker 30s
  re-renders relative timestamps. Both skip the swap if focus is
  inside the slot (no focus theft).
- Optimistic toggle with rollback + 3s-deduplicated toasts. Input
  disabled during inflight POST (closes the toggle race risk).
- new-chat hook shows a spinner on the launch button + aria-busy.
  Re-entrancy guard via dataset.rrLaunchInflight.
- A11y: native checkbox (no role="switch"), role="group" wrapper,
  role="status" + aria-live="polite" on the outer badge slot so
  innerHTML swaps don't tear down the live region, aria-describedby
  -> visually-hidden span with full error text for SR/keyboard.
  prefers-reduced-motion disables the spinner animation.

Tests: 34 new tests across atomic.js, repo-refresh.js, the routes,
and explicit cases for credential redaction, known-roots gating,
and provider injection.

Deferred (documented LOW from review pass 2, separate follow-ups):
- Retrofit pre-existing readBody to enforce body size cap across
  16 callers (out of scope for this feature; chore PR).
- Periodic scheduler (5/10/15/30/60 min intervals).
- Page-refresh bulk trigger from the browser.
- "Behind by N commits" indicator.
- Connection-lost banner when poll fails.
- toast queue/stack (current dedupe is sufficient for v1).
@NovakPAai NovakPAai force-pushed the feat/repo-auto-refresh branch from af9cb0c to 24d6fea Compare May 14, 2026 09:41
Copy link
Copy Markdown
Owner

@vakovalskii vakovalskii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ Re-reviewed after rebase.

Verified locally:

  • node -c clean on all 6 touched/new backend files (repo-refresh.js, repo-refresh-routes.js, atomic.js, server.js, data.js, bin/cli.js)
  • Spot-checked the security posture against yesterday's pre-rebase reading: dependency injection for testability, credential redaction in lastError (/(https?:\/\/)[^/\s:@]+:[^/\s@]+@/), known-roots gate on initOnStartup, SIGTERM→2s grace→SIGKILL, atomic settings persistence — all intact

One non-blocking follow-up to flag (not a merge blocker):

When I run node --test test/repo-refresh.test.js locally, tests 1–11 pass but 12–16 are reported cancelled (not failed). Each cancelled test passes in isolation via --test-name-pattern. Looks like one of the earlier subtests leaks a setTimeout or unresolved promise that survives into the next test's window. The two most likely candidates are initOnStartup (test 11) and the debouncedSave timer in updateSettings (test 7) — both schedule timers that may outlive their assertions. Worth a quick pass with afterEach cleanup or a manager-level shutdown() method that cancels pending timers.

A more important observation that surfaced during this dig: the CI workflow .github/workflows/ci.yml only runs node bin/cli.js version/help and a node -c syntax check on 5 files — it does not invoke node --test. So "CI green 6/6" on this PR means no test was actually run. Worth a short follow-up PR to add - name: Tests\n run: node --test test/*.test.js to the workflow so future regressions are caught.

Both of these are housekeeping and don't gate this merge — the production code itself is solid. Thanks @NovakPAai!

@vakovalskii vakovalskii merged commit 94e7be6 into main May 15, 2026
6 checks passed
@vakovalskii vakovalskii deleted the feat/repo-auto-refresh branch May 15, 2026 07:59
vakovalskii pushed a commit that referenced this pull request May 16, 2026
* fix(repo-refresh): cancel manager-owned timers in test teardown + run tests in CI

Addresses two non-blocking follow-ups from #213 review.

Test isolation: createRepoRefreshManager() now tracks every setTimeout it
schedules (debounced save, runFetch 60s timeout/SIGKILL grace, waiter
timeouts) in a Set and exposes shutdown() to cancel them all. Earlier reliance
on timer.unref() was enough to let the process exit, but node:test's resource
tracker still observed the pending handle and intermittently reported
tests 12-16 as cancelled (per maintainer's report). Tests now register
shutdown() via t.after() through a makeMgr(t, opts) helper — no more leak.

CI: workflows/ci.yml previously ran only `node bin/cli.js version|help` and
`node -c` syntax checks on five files, so "CI green" meant no test ever ran.
Adds `node --test test/*.test.js` so future regressions are caught.

wsl-windows.test.js skipped on non-win32: the assertion under test relies on
backslash path joining which only happens when path defaults to win32. The
companion "returns empty on non-Windows" test still validates the
cross-platform branch.

* fix(repo-refresh): resolve pending fetch promises on shutdown

Following up on the CI failure for the first attempt at this fix. macOS
runners on node:test reported tests 12–16 as cancelledByParent with the
error "Promise resolution is still pending but the event loop has already
resolved", even though all timers were now cleared.

Root cause: test 11 ("initOnStartup garbage-collects orphan perProject
entries") triggers a fetch for /repos/alive via initOnStartup but never
resolves the mocked execFile call. shutdown() cancelled the 60s
timeoutTimer but left the inflight Promise hanging — stricter node:test
hosts (macOS CI) flag this at test teardown, while my local machine
tolerated it.

Track each runFetch finalize callback in `activeFinalizers` and resolve
every outstanding fetch with a synthetic "cancelled (manager shutdown)"
error state when shutdown() runs. Tests now pass cleanly on the same
runners that failed before.

* fix(repo-refresh): drop .unref() on awaited timers + thread shutdown into api tests

CI second run still failed tests 12–16 with "Promise resolution is still
pending but the event loop has already resolved". Root cause is more
fundamental than a missing cleanup hook:

The four manager timers (runFetch timeoutTimer/killTimer, waiter `to`,
debounced saveTimer) were all .unref()'d. unref'd timers are not counted
as keeping the event loop alive — so when a test does
`await mgr.waitForRefreshOrTimeout(..., 30)`, the only handle keeping the
loop alive is the unref'd 30ms timer. node:test sees the loop drain to
idle while the test's returned Promise is still pending and flags it.

Now that shutdown() exists, .unref() is redundant for graceful host
exit. Drop all four .unref() calls so the timers properly keep the event
loop alive while in flight; hosts call shutdown() to release them.

Also thread t.after(shutdown) through repo-refresh-api.test.js's
makeDeps(t, ...) helper — that file ran before repo-refresh.test.js in
the CI test order and was leaving timers across the file boundary.

---------

Co-authored-by: jackrescuer-gif <jackrescuer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants