Skip to content

feat(health): detect token revocation via authenticated health checks#409

Merged
polaz merged 20 commits intomainfrom
feat/#370-feathealth-detect-token-revocation-via-authenticat
Apr 11, 2026
Merged

feat(health): detect token revocation via authenticated health checks#409
polaz merged 20 commits intomainfrom
feat/#370-feathealth-detect-token-revocation-via-authenticat

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 8, 2026

Summary

  • Add authenticatedTokenCheck() that runs HEAD /api/v4/user with the static token after each successful unauthenticated reachability probe
  • Uses skipAuth: true + explicit PRIVATE-TOKEN header so the probe validates only the static token — session cookies cannot mask a revoked token
  • A 401 or 403 response is detected via parseGitLabApiError(); the new healthCheckErrorIsAuth guard transitions the instance to failed state (no auto-reconnect)
  • Non-auth non-2xx responses (429, 5xx) throw and are classified as transient by the catch block — the instance stays healthy, not failed
  • Skipped in OAuth mode and when GITLAB_TOKEN is unset
  • Both healthy.checking.onError and degraded.checking.onError share a single healthCheckOnError constant (eliminates duplicate blocks, fixes SonarCloud CPD)
  • performConnect fast-path also calls authenticatedTokenCheck so forceReconnect cannot bypass revocation detection
  • Fix release workflow: fromJSON extracts numeric PR number from release-please v4 JSON output, preventing bash syntax errors from markdown links in changelog

Problem

The periodic health check used only an unauthenticated HEAD /api/v4/version probe, which treated 401 as "server is alive". If a token was revoked mid-session, the monitor stayed in healthy state indefinitely. Additionally, forceReconnect took a fast-path that skipped token validation on reconnect, and the release workflow crashed on changelogs with markdown links.

Test plan

  • 10 new unit tests: token revocation (401 + 403) from healthy/degraded, OAuth skip, no-token skip, network error swallowed, recovery via forceReconnect, fast-path regression, transient non-auth probes (429/500/503)
  • All 4954 tests pass
  • Lint clean, build clean
  • codecov/patch ≥ 85% ✅
  • SonarCloud CPD ≤ 3% ✅

Closes #370

Add authenticated HEAD /api/v4/user check to periodic health monitor
probes (static token mode only; skipped in OAuth mode where no global
token is available during background checks).

Previously the health monitor stayed in `healthy` state after a token
was revoked because the unauthenticated /api/v4/version probe treated
401 as "server is alive" (status < 500). The new authenticatedTokenCheck
runs after each successful reachability probe and throws a GitLab API
401 error on revocation. classifyError maps this to 'auth', and the new
healthCheckErrorIsAuth guard in both healthy.checking.onError and
degraded.checking.onError transitions the instance directly to `failed`
state — disabling auto-reconnect until the user fixes the token.

Closes #370
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 243bd0f9-7bdf-4edd-8619-f72242d39305

📥 Commits

Reviewing files that changed from the base of the PR and between 5f8b9cf and e2204b7.

📒 Files selected for processing (6)
  • .github/workflows/release-please.yml
  • README.md
  • README.md.in
  • sonar-project.properties
  • src/services/HealthMonitor.ts
  • tests/unit/services/HealthMonitor.test.ts

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Updated error classification and token revocation detection documentation.
  • New Features

    • Implemented token revocation detection during periodic health checks for improved security.
  • Tests

    • Added comprehensive test coverage for authentication scenarios.
  • Chores

    • Configured SonarCloud for enhanced code quality monitoring.
    • Improved release workflow output parsing.

Walkthrough

Adds an authenticated static-token probe (HEAD /api/v4/user with PRIVATE-TOKEN) to connect and periodic health-check flows when OAuth is disabled and GITLAB_TOKEN is present; a 401/403 from that probe immediately transitions the instance to failed. Docs and instance-cache config entries updated.

Changes

Cohort / File(s) Summary
Documentation
README.md, README.md.in
Update “Error classification” to remove prior claim about forwarding runtime auth errors; document mid-session authenticated probe (skipped for OAuth), 401/403failed; add GITLAB_INSTANCE_CACHE_MAX and GITLAB_INSTANCE_TTL_MS config entries.
Health monitor implementation
src/services/HealthMonitor.ts
Add authenticatedTokenCheck(instanceUrl, timeoutMs) and integrate it into performConnect and periodic performHealthCheck; centralize healthCheckOnError and add guard healthCheckErrorIsAuth; route auth failures (401/403) to #connection.failed (disabling auto-reconnect); preserve prior transient/net error handling.
Tests
tests/unit/services/HealthMonitor.test.ts
Make GITLAB_TOKEN and isOAuthEnabled() mockable; add helpers to stub /api/v4/user; add token revocation detection tests covering authenticated probe behavior (401/403 → failed), skips (no-token/OAuth), swallowed network/transient errors, recovery and forceReconnect() fast-path.
CI / Analysis config
sonar-project.properties, .github/workflows/release-please.yml
Add SonarCloud CPD exclusion tests/**; change release-please job output pr-number to extract numeric PR id (or empty string) for downstream consumers.

Sequence Diagram(s)

sequenceDiagram
    participant HC as Health Monitor
    participant Quick as Quick Health Check
    participant Auth as Authenticated Token Check
    participant API as GitLab API
    participant SM as State Machine

    HC->>Quick: HEAD /api/v4/version (unauthenticated)
    alt static token present & OAuth disabled
        HC->>Auth: authenticatedTokenCheck(instanceUrl, timeout)
        Auth->>API: HEAD /api/v4/user (PRIVATE-TOKEN)
        alt 401/403
            API-->>Auth: 401/403
            Auth-->>HC: throw auth error
            HC->>SM: healthCheckOnError(auth)
            SM-->>SM: healthCheckErrorIsAuth → transition to failed
        else network/timeout or 5xx/429
            API-->>Auth: error/transient
            Auth-->>HC: swallow / treat transient
            HC->>SM: recordFailure / continue existing logic
        end
    else OAuth enabled or no token
        HC-->>Auth: skipped
        HC->>SM: continue existing health logic
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(health): detect token revocation via authenticated health checks' accurately summarizes the main change: adding authenticated token validation to health monitoring.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, solution, implementation details, and test coverage for token revocation detection.
Linked Issues check ✅ Passed All coding objectives from issue #370 are met: authenticated health checks added, 401/403 detection implemented, failed state transition enabled, and checks integrated into performHealthCheck and performConnect.
Out of Scope Changes check ✅ Passed All changes are within scope: token revocation detection (HealthMonitor.ts, tests), documentation updates (README), test helpers, sonar configuration, and release workflow fix for changelog parsing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#370-feathealth-detect-token-revocation-via-authenticat

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

Test Coverage Report

Overall Coverage: 96.59%

Metric Percentage
Statements 95.96%
Branches 87.28%
Functions 94.96%
Lines 96.59%

View detailed coverage report

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 81.48148% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/services/HealthMonitor.ts 81.48% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds token-revocation detection to the HealthMonitor’s periodic probes by layering an authenticated check (static-token mode only) on top of the existing unauthenticated reachability probe, so revoked tokens can transition an instance to failed without relying on tool-call traffic.

Changes:

  • Add an authenticated periodic health probe (HEAD /api/v4/user) to detect mid-session static-token revocation (401 → failed).
  • Extend the XState machine’s periodic health-check error handling to treat auth failures as terminal (failed) rather than continuing normal retry behavior.
  • Add unit tests covering healthy/degraded → failed transitions on 401, OAuth-mode skip, swallowing network errors, and recovery via forceReconnect; update docs to reflect the new behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/services/HealthMonitor.ts Adds authenticated token validity check and routes periodic auth failures to the terminal failed state.
tests/unit/services/HealthMonitor.test.ts Adds regression tests for token revocation detection, OAuth skip behavior, and recovery via manual reconnect.
README.md.in Documents mid-session token revocation detection behavior in connection resilience section.
README.md Mirrors the documentation update for generated README output.

Comment thread src/services/HealthMonitor.ts Outdated
- Change authenticated check error from "token revoked or expired" to
  "token invalid" — a 401 from /api/v4/user can indicate wrong token
  value, missing scope, or other auth failure, not only revocation
- Make GITLAB_TOKEN config mock dynamic via getter
- Add test: authenticatedTokenCheck skips when GITLAB_TOKEN is unset
@coderabbitai coderabbitai Bot added the javascript Pull requests that update javascript code label Apr 8, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/HealthMonitor.ts`:
- Around line 313-324: The authenticated-probe currently only maps a 401 to a
terminal auth error so 403s are treated as success; update the HealthMonitor
authenticated probe (the block that checks response.status === 401) to treat
response.status === 403 the same way (e.g., if (response.status === 401 ||
response.status === 403) throw a dedicated auth Error), and update the
corresponding catch check (where it tests error.message.startsWith('GitLab API
error: 401')) to also recognize the 403 case (either check both '401' and '403'
or use a shared prefix like 'GitLab API error: 40') so classifyError /
healthCheckErrorIsAuth will route both statuses to the failed transition rather
than falling through to recordFailure.

In `@tests/unit/services/HealthMonitor.test.ts`:
- Around line 840-993: Add a new test case alongside the existing "token
revocation detection" specs that asserts a 403 from the authenticated endpoint
also moves the monitor to "failed": in the describe block add an it similar to
"should transition to failed when authenticated health check returns 401 (token
revoked)" but have mockFetch.mockImplementation return { status: 403, ok: false
} for urls including '/api/v4/user', then await the health check interval and
assert monitor.getState(TEST_URL) === 'failed' and
monitor.isInstanceReachable(TEST_URL) === false; use the same setup helpers
(mockInitialize, mockGetInstanceInfo, initMonitor, mockGitLabToken if needed)
and mirror the existing 401 test flow so the 403 path is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: daedd02e-974b-427c-a041-2f3e1ee0feb0

📥 Commits

Reviewing files that changed from the base of the PR and between 5f8b9cf and 3ddfc41.

📒 Files selected for processing (4)
  • README.md
  • README.md.in
  • src/services/HealthMonitor.ts
  • tests/unit/services/HealthMonitor.test.ts

Comment thread src/services/HealthMonitor.ts Outdated
Comment thread tests/unit/services/HealthMonitor.test.ts
@polaz polaz requested a review from Copilot April 8, 2026 20:04
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

@polaz Sure! Triggering a re-review of PR #409 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:145-160
Timestamp: 2026-03-29T19:15:31.468Z
Learning: In structured-world/gitlab-mcp, the periodic health check in `src/services/HealthMonitor.ts` (`performHealthCheck`) is intentionally unauthenticated (HEAD /api/v4/version). Token revocation / authenticated health-check detection is intentionally out of scope for PR `#369` (startup resilience) and is tracked separately in issue `#370`. Do not flag the unauthenticated health probe as a bug in reviews of this file/PR.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:313-324
Timestamp: 2026-04-08T19:14:05.332Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `healthCheckErrorIsAuth` XState guard does NOT use `classifyError()` to detect auth failures from the authenticated health probe. Instead, it checks the error message directly (e.g., message starts with 'GitLab API error: 401' or 'GitLab API error: 403'). This bypass is intentional: `classifyError` maps HTTP 403 to 'permanent' (not 'auth'), which would cause 403 token-forbidden responses to fall through to `recordFailure` and stay in healthy/degraded. Both 401 and 403 from `/api/v4/user` are terminal auth failures that must route to the `failed` transition with no auto-reconnect. Do not flag this direct message-check as bypassing the error classification layer in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T12:46:57.150Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor uses a dedicated `InitializationTimeoutError` class (around lines 152-166) to signal an init-timeout rejection, replacing prior fragile string-based error matching. `classifyError` in the catch block relies on this class identity to reliably distinguish init timeouts from other transient/auth/permanent failures. Do not suggest reverting to string-matching for timeout detection in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:215-216
Timestamp: 2026-03-30T21:18:53.194Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `quickHealthCheck` is intentionally a reachability-only probe — `response.status < 500` is the correct condition because any HTTP response (including 400/404) proves the server is responding. It is NOT an API validation check. Actual API correctness (correct base URL, valid endpoints, auth) is verified at tool execution level. Do not flag this threshold as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T23:36:18.216Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor (around lines 184-196) skips the post-init health probe when the remaining budget is < 500ms. When the remaining time is near-zero, the instance is assumed reachable (since init already succeeded) to prevent a false `disconnected` transition caused by an artificially short probe timeout. Do not flag skipping the health probe in this near-zero budget path as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:578-603
Timestamp: 2026-03-31T10:09:51.540Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `handleStateChange` intentionally early-returns before calling `registry.updateConnectionStatus` when the state hasn't changed (`previousState === newState`). This is by design: `updateConnectionStatus` is idempotent and calling it on every periodic health-check poll (e.g., `healthy.checking → healthy.idle`) would add unnecessary overhead with no functional benefit. Do not flag this early-return ordering as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T07:33:01.038Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `refreshTokenScopes()` (around lines 624-651) applies a post-await identity guard: after `await detectTokenScopes(url)` returns, it re-reads `this.instances.get(url)` and compares it to the `state` reference captured before the await. If a concurrent `reinitialize()` or `reset()` replaced the entry, it returns `false` instead of writing stale scope info into the new state. This mirrors the identity guard in `doInitialize`'s catch block. Do not flag the lack of this guard as a stale-write bug in future reviews — it is already in place.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T08:14:19.833Z
Learning: In `src/utils/fetch.ts` (structured-world/gitlab-mcp), `skipAuth: true` suppresses BOTH `getAuthHeaders()` AND `loadCookieHeader()` in the internal `doFetch()` call. This ensures unauthenticated probes (e.g., HEAD /api/v4/version health checks) carry no credentials or session cookies. Do not flag missing cookie suppression for `skipAuth` paths in future reviews — it is already handled.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 380
File: src/services/HealthMonitor.ts:112-206
Timestamp: 2026-04-03T00:40:13.151Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor function (around lines 112–206) has a cognitive complexity of ~17 (SonarCloud limit: 15) due to three interleaved phases: fast-path health check, full initialization with timeout, and degraded-path reachability verification. The maintainer (polaz) has explicitly declined extracting these into helpers because all three phases share a single `deadline` variable that defines the unified startup budget — splitting would obscure this invariant. A code comment was added above `performConnect` explaining the intentional complexity. Do not re-flag this complexity issue in future reviews of this function.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:529-564
Timestamp: 2026-04-02T09:32:46.067Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `HealthMonitor` creates one XState actor per monitored instance URL. Actors are only created for instances registered via `InstanceRegistry` config (1–3 in practice), so there is no unbounded actor accumulation. A follow-up issue has already been created to track per-URL eviction/inactivity TTL for future multi-instance/OAuth scenarios. Do not flag actor accumulation as an active memory leak in reviews of this file.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
README.md (1)

91-91: 🧹 Nitpick | 🔵 Trivial

Same as README.md.in: consider mentioning 403 alongside 401.

The generated README has the same gap — both 401 and 403 are treated as auth failures in the implementation. Update the template (README.md.in) and regenerate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 91, The README text for the authenticated HEAD
/api/v4/user health check only mentions 401 but the implementation treats both
401 and 403 as auth failures; update the README.md.in template to mention "401
and 403" (or "401/403") as triggering an immediate transition to failed state
for the static token mode HEAD /api/v4/user check, then regenerate README.md so
the documentation matches the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md.in`:
- Line 91: Update the documentation sentence that describes the authenticated
HEAD /api/v4/user health check (static token mode) to state that both 401 and
403 responses transition the instance to the `failed` state; specifically modify
the sentence referencing "HEAD /api/v4/user" and "A 401 on this check
transitions the instance to `failed` state immediately" so it mentions "401 or
403" (or "401/403") as terminal auth failures to match the implementation.

In `@src/services/HealthMonitor.ts`:
- Around line 229-233: Update the inline comment for the authenticatedTokenCheck
call to accurately describe the detection path: state that mid-session token
revocation triggers a GitLab API 401 which is detected by the
healthCheckErrorIsAuth guard by inspecting the error message directly (not via
classifyError, which maps 403 → 'permanent'), leading to '#connection.failed'
with no auto-reconnect; also note this is a no-op in OAuth mode and when
GITLAB_TOKEN is unset. Mention the exact functions involved:
authenticatedTokenCheck and healthCheckErrorIsAuth, and remove the incorrect
"classifyError → 'auth'" phrasing.

---

Duplicate comments:
In `@README.md`:
- Line 91: The README text for the authenticated HEAD /api/v4/user health check
only mentions 401 but the implementation treats both 401 and 403 as auth
failures; update the README.md.in template to mention "401 and 403" (or
"401/403") as triggering an immediate transition to failed state for the static
token mode HEAD /api/v4/user check, then regenerate README.md so the
documentation matches the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9cacad74-ba73-4d22-aa89-5bd6b9d4390d

📥 Commits

Reviewing files that changed from the base of the PR and between 5f8b9cf and d7d618f.

📒 Files selected for processing (5)
  • README.md
  • README.md.in
  • sonar-project.properties
  • src/services/HealthMonitor.ts
  • tests/unit/services/HealthMonitor.test.ts

Comment thread README.md.in Outdated
Comment thread src/services/HealthMonitor.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Comment thread src/services/HealthMonitor.ts Outdated
Comment thread src/services/HealthMonitor.ts Outdated
Comment thread src/services/HealthMonitor.ts Outdated
Comment thread src/services/HealthMonitor.ts Outdated
Comment thread README.md Outdated
Comment thread README.md.in Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/HealthMonitor.ts`:
- Around line 229-235: The health probe (authenticatedTokenCheck) is only
invoked from performHealthCheck, so a manual RECONNECT path (performConnect →
quickHealthCheck) can bypass token revalidation and let a revoked token flip a
connection from failed → healthy; update performConnect (or the RECONNECT
handling) to call authenticatedTokenCheck(input.instanceUrl,
HEALTH_CHECK_PROBE_MS) or an equivalent token revalidation before accepting a
RECONNECT transition (i.e., before the fast-path quickHealthCheck succeeds), and
add a regression test that simulates a 401/403 during RECONNECT to assert the
actor remains in failed until credentials are fixed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ffcbecb-6ae8-4cc8-a630-ff342ddcf90a

📥 Commits

Reviewing files that changed from the base of the PR and between d7d618f and e7fed88.

📒 Files selected for processing (3)
  • README.md
  • README.md.in
  • src/services/HealthMonitor.ts

Comment thread src/services/HealthMonitor.ts
@polaz polaz requested a review from Copilot April 8, 2026 21:03
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

@polaz Sure! Triggering a full re-review of PR #409 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:145-160
Timestamp: 2026-03-29T19:15:31.468Z
Learning: In structured-world/gitlab-mcp, the periodic health check in `src/services/HealthMonitor.ts` (`performHealthCheck`) is intentionally unauthenticated (HEAD /api/v4/version). Token revocation / authenticated health-check detection is intentionally out of scope for PR `#369` (startup resilience) and is tracked separately in issue `#370`. Do not flag the unauthenticated health probe as a bug in reviews of this file/PR.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:578-603
Timestamp: 2026-03-31T10:09:51.540Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `handleStateChange` intentionally early-returns before calling `registry.updateConnectionStatus` when the state hasn't changed (`previousState === newState`). This is by design: `updateConnectionStatus` is idempotent and calling it on every periodic health-check poll (e.g., `healthy.checking → healthy.idle`) would add unnecessary overhead with no functional benefit. Do not flag this early-return ordering as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:229-235
Timestamp: 2026-04-08T20:57:28.088Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `authenticatedTokenCheck` is called in BOTH the `performConnect` fast-path (after `quickHealthCheck`, when `connectionManager.isConnected()` is true) AND in `performHealthCheck`. This ensures that a `forceReconnect()` while the token is still revoked will fail auth and remain in `failed` state, not bounce back to `healthy`. Do not flag the RECONNECT path as bypassing token revalidation in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 380
File: src/services/HealthMonitor.ts:112-206
Timestamp: 2026-04-03T00:40:13.151Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor function (around lines 112–206) has a cognitive complexity of ~17 (SonarCloud limit: 15) due to three interleaved phases: fast-path health check, full initialization with timeout, and degraded-path reachability verification. The maintainer (polaz) has explicitly declined extracting these into helpers because all three phases share a single `deadline` variable that defines the unified startup budget — splitting would obscure this invariant. A code comment was added above `performConnect` explaining the intentional complexity. Do not re-flag this complexity issue in future reviews of this function.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:529-564
Timestamp: 2026-04-02T09:32:46.067Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `HealthMonitor` creates one XState actor per monitored instance URL. Actors are only created for instances registered via `InstanceRegistry` config (1–3 in practice), so there is no unbounded actor accumulation. A follow-up issue has already been created to track per-URL eviction/inactivity TTL for future multi-instance/OAuth scenarios. Do not flag actor accumulation as an active memory leak in reviews of this file.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T23:36:18.216Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor (around lines 184-196) skips the post-init health probe when the remaining budget is < 500ms. When the remaining time is near-zero, the instance is assumed reachable (since init already succeeded) to prevent a false `disconnected` transition caused by an artificially short probe timeout. Do not flag skipping the health probe in this near-zero budget path as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:215-216
Timestamp: 2026-03-30T21:18:53.194Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `quickHealthCheck` is intentionally a reachability-only probe — `response.status < 500` is the correct condition because any HTTP response (including 400/404) proves the server is responding. It is NOT an API validation check. Actual API correctness (correct base URL, valid endpoints, auth) is verified at tool execution level. Do not flag this threshold as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:313-324
Timestamp: 2026-04-08T19:14:05.332Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `healthCheckErrorIsAuth` XState guard does NOT use `classifyError()` to detect auth failures from the authenticated health probe. Instead, it checks the error message directly (e.g., message starts with 'GitLab API error: 401' or 'GitLab API error: 403'). This bypass is intentional: `classifyError` maps HTTP 403 to 'permanent' (not 'auth'), which would cause 403 token-forbidden responses to fall through to `recordFailure` and stay in healthy/degraded. Both 401 and 403 from `/api/v4/user` are terminal auth failures that must route to the `failed` transition with no auto-reconnect. Do not flag this direct message-check as bypassing the error classification layer in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T12:46:57.150Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor uses a dedicated `InitializationTimeoutError` class (around lines 152-166) to signal an init-timeout rejection, replacing prior fragile string-based error matching. `classifyError` in the catch block relies on this class identity to reliably distinguish init timeouts from other transient/auth/permanent failures. Do not suggest reverting to string-matching for timeout detection in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:100-108
Timestamp: 2026-03-29T18:10:49.416Z
Learning: In structured-world/gitlab-mcp, the `ConnectionManager.isConnected()` / `getInstanceInfo()` singleton path in `HealthMonitor.ts` (`performConnect`) is intentional for single-instance mode. Multi-instance OAuth contexts use `InstanceRegistry` for per-URL state. Do not flag this singleton usage as a bug in future reviews; the two modes have separate code paths by design.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T07:33:01.038Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `refreshTokenScopes()` (around lines 624-651) applies a post-await identity guard: after `await detectTokenScopes(url)` returns, it re-reads `this.instances.get(url)` and compares it to the `state` reference captured before the await. If a concurrent `reinitialize()` or `reset()` replaced the entry, it returns `false` instead of writing stale scope info into the new state. This mirrors the identity guard in `doInitialize`'s catch block. Do not flag the lack of this guard as a stale-write bug in future reviews — it is already in place.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T08:14:19.833Z
Learning: In `src/utils/fetch.ts` (structured-world/gitlab-mcp), `skipAuth: true` suppresses BOTH `getAuthHeaders()` AND `loadCookieHeader()` in the internal `doFetch()` call. This ensures unauthenticated probes (e.g., HEAD /api/v4/version health checks) carry no credentials or session cookies. Do not flag missing cookie suppression for `skipAuth` paths in future reviews — it is already handled.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread tests/unit/services/HealthMonitor.test.ts Outdated
Lifts stubUserEndpointStatus from inside describe('HealthMonitor') to
true module scope so SonarCloud no longer flags it as a nested function.
Replaces not.toHaveBeenCalledWith matcher with mock.calls.some() for the
skip-path negative assertion.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 9, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

@polaz Sure! Triggering a full re-review of PR #409 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T15:31:04.797Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), the stale-init overwrite race in `doInitialize()` (lines ~117-163) is mitigated by identity guards: the outer `let state` binding + `this.instances.get(baseUrl) === state` check in the catch block (lines ~294-298) prevents a stale init from deleting a fresh state entry. A deeper fix using per-attempt init tokens (to prevent any intermediate writes from stale paths) requires significant refactoring across `doInitialize`/`clearInflight`/`reset` and is tracked in issue `#379`. Do not re-flag the stale-init race as unguarded in future reviews of this file.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:229-235
Timestamp: 2026-04-08T20:57:28.088Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `authenticatedTokenCheck` is called in BOTH the `performConnect` fast-path (after `quickHealthCheck`, when `connectionManager.isConnected()` is true) AND in `performHealthCheck`. This ensures that a `forceReconnect()` while the token is still revoked will fail auth and remain in `failed` state, not bounce back to `healthy`. Do not flag the RECONNECT path as bypassing token revalidation in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T07:33:01.038Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `refreshTokenScopes()` (around lines 624-651) applies a post-await identity guard: after `await detectTokenScopes(url)` returns, it re-reads `this.instances.get(url)` and compares it to the `state` reference captured before the await. If a concurrent `reinitialize()` or `reset()` replaced the entry, it returns `false` instead of writing stale scope info into the new state. This mirrors the identity guard in `doInitialize`'s catch block. Do not flag the lack of this guard as a stale-write bug in future reviews — it is already in place.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:578-603
Timestamp: 2026-03-31T10:09:51.540Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `handleStateChange` intentionally early-returns before calling `registry.updateConnectionStatus` when the state hasn't changed (`previousState === newState`). This is by design: `updateConnectionStatus` is idempotent and calling it on every periodic health-check poll (e.g., `healthy.checking → healthy.idle`) would add unnecessary overhead with no functional benefit. Do not flag this early-return ordering as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T11:32:28.956Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the first `handleStateChange` callback that fires before `InstanceRegistry` is initialized results in a silently dropped `updateConnectionStatus` call. This is documented in Known Limitations: the registry catches up on the next real state transition, and fixing the miss would require a pending-status queue — which is out of scope. Do not flag this first-transition registry miss as an unhandled bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:313-324
Timestamp: 2026-04-08T19:14:05.332Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `healthCheckErrorIsAuth` XState guard does NOT use `classifyError()` to detect auth failures from the authenticated health probe. Instead, it checks the error message directly (e.g., message starts with 'GitLab API error: 401' or 'GitLab API error: 403'). This bypass is intentional: `classifyError` maps HTTP 403 to 'permanent' (not 'auth'), which would cause 403 token-forbidden responses to fall through to `recordFailure` and stay in healthy/degraded. Both 401 and 403 from `/api/v4/user` are terminal auth failures that must route to the `failed` transition with no auto-reconnect. Do not flag this direct message-check as bypassing the error classification layer in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:145-160
Timestamp: 2026-03-29T19:15:31.468Z
Learning: In structured-world/gitlab-mcp, the periodic health check in `src/services/HealthMonitor.ts` (`performHealthCheck`) is intentionally unauthenticated (HEAD /api/v4/version). Token revocation / authenticated health-check detection is intentionally out of scope for PR `#369` (startup resilience) and is tracked separately in issue `#370`. Do not flag the unauthenticated health probe as a bug in reviews of this file/PR.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/handlers.ts:329-366
Timestamp: 2026-03-29T21:05:15.421Z
Learning: In `src/handlers.ts` (structured-world/gitlab-mcp), the early-return branch for `manage_context` when the instance is **unreachable** intentionally omits `healthMonitor.reportSuccess` / `reportError`, because in that path context tools (whoami, switch_instance) operate on local state only and never call the GitLab API. However, when the instance IS reachable/healthy, `manage_context` (e.g., `whoami`) does call the GitLab API and falls through the normal execution path including `reportSuccess`/`reportError` — this is correct and intentional. Do not flag this asymmetry as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:100-108
Timestamp: 2026-03-29T18:10:49.416Z
Learning: In structured-world/gitlab-mcp, the `ConnectionManager.isConnected()` / `getInstanceInfo()` singleton path in `HealthMonitor.ts` (`performConnect`) is intentional for single-instance mode. Multi-instance OAuth contexts use `InstanceRegistry` for per-URL state. Do not flag this singleton usage as a bug in future reviews; the two modes have separate code paths by design.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:724-732
Timestamp: 2026-04-02T10:46:41.853Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `isAnyInstanceHealthy()` intentionally includes the `'connecting'` state in its truthy set (alongside `'healthy'` and `'degraded'`). During startup all instances are in `'connecting'`, and excluding it would cause `tools/list` to return context-only tools for the entire initialization window (~5s), which is bad UX. The handler-level `isInstanceReachable()` check is the real runtime gate that prevents tool execution against unreachable instances. Do not flag the inclusion of `'connecting'` in `isAnyInstanceHealthy()` as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 380
File: src/services/HealthMonitor.ts:112-206
Timestamp: 2026-04-03T00:40:13.151Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor function (around lines 112–206) has a cognitive complexity of ~17 (SonarCloud limit: 15) due to three interleaved phases: fast-path health check, full initialization with timeout, and degraded-path reachability verification. The maintainer (polaz) has explicitly declined extracting these into helpers because all three phases share a single `deadline` variable that defines the unified startup budget — splitting would obscure this invariant. A code comment was added above `performConnect` explaining the intentional complexity. Do not re-flag this complexity issue in future reviews of this function.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:529-564
Timestamp: 2026-04-02T09:32:46.067Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `HealthMonitor` creates one XState actor per monitored instance URL. Actors are only created for instances registered via `InstanceRegistry` config (1–3 in practice), so there is no unbounded actor accumulation. A follow-up issue has already been created to track per-URL eviction/inactivity TTL for future multi-instance/OAuth scenarios. Do not flag actor accumulation as an active memory leak in reviews of this file.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T23:36:18.216Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor (around lines 184-196) skips the post-init health probe when the remaining budget is < 500ms. When the remaining time is near-zero, the instance is assumed reachable (since init already succeeded) to prevent a false `disconnected` transition caused by an artificially short probe timeout. Do not flag skipping the health probe in this near-zero budget path as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T12:46:57.150Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor uses a dedicated `InitializationTimeoutError` class (around lines 152-166) to signal an init-timeout rejection, replacing prior fragile string-based error matching. `classifyError` in the catch block relies on this class identity to reliably distinguish init timeouts from other transient/auth/permanent failures. Do not suggest reverting to string-matching for timeout detection in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:215-216
Timestamp: 2026-03-30T21:18:53.194Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `quickHealthCheck` is intentionally a reachability-only probe — `response.status < 500` is the correct condition because any HTTP response (including 400/404) proves the server is responding. It is NOT an API validation check. Actual API correctness (correct base URL, valid endpoints, auth) is verified at tool execution level. Do not flag this threshold as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T08:14:19.833Z
Learning: In `src/utils/fetch.ts` (structured-world/gitlab-mcp), `skipAuth: true` suppresses BOTH `getAuthHeaders()` AND `loadCookieHeader()` in the internal `doFetch()` call. This ensures unauthenticated probes (e.g., HEAD /api/v4/version health checks) carry no credentials or session cookies. Do not flag missing cookie suppression for `skipAuth` paths in future reviews — it is already handled.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:666-669
Timestamp: 2026-04-02T10:24:42.437Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `resolveUrl(instanceUrl?: string)` method and the state-mutation methods that call it (`reportSuccess`, `reportError`, `forceReconnect`) intentionally accept an optional `instanceUrl` with a fallback to `GITLAB_BASE_URL`. This zero-arg fallback is by design for test ergonomics and is documented by the test "should use default URL for reportSuccess/reportError" (around line 716–718 of `tests/unit/services/HealthMonitor.test.ts`). All production callers in `src/handlers.ts` already pass an explicit `instanceUrl`. Do not flag the optional parameter or the GITLAB_BASE_URL fallback as unsafe or requiring a required-parameter refactor in future reviews.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/services/HealthMonitor.ts Outdated
… and transient only

Previously all non-401/403 errors were silently swallowed, hiding programming
bugs (invalid URL, bad options) that would leave the instance in healthy state
with a broken probe. Now only AbortError (our own timeout) and classifyError
transient failures are swallowed; unexpected errors are logged and re-thrown.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread tests/unit/services/HealthMonitor.test.ts Outdated
ok was hardcoded to false regardless of status. Set it to
status >= 200 && status < 300 to match the real fetch contract.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/HealthMonitor.ts`:
- Around line 311-329: The probe currently only throws for 401/403 so other
non-OK responses (e.g., 429, 5xx, 405) are treated as success; update the check
after enhancedFetch(...) in HealthMonitor.ts to throw for any non-2xx response
(use response.ok or response.status !== 200/204 as appropriate) and include the
response.status in the Error message (replace the existing 401/403-only branch
or augment it to cover all non-ok statuses while keeping the specific message
for 401/403). Also add one regression test that mocks enhancedFetch to return a
non-OK status (e.g., 429 or 500) and asserts the probe path throws/enters the
catch to exercise the new behavior. Ensure references: enhancedFetch,
controller.signal, GITLAB_TOKEN, and the error throw site are updated
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 94fe33f7-779b-42c0-9f24-d73a5cd76a8d

📥 Commits

Reviewing files that changed from the base of the PR and between 8fa0311 and 119cc0a.

📒 Files selected for processing (2)
  • src/services/HealthMonitor.ts
  • tests/unit/services/HealthMonitor.test.ts

Comment thread src/services/HealthMonitor.ts
A 429 rate-limit or 5xx server error from HEAD /api/v4/user was
previously treated as a successful probe, silently masking a broken
status code. Now any non-2xx, non-auth response throws so the catch
block can classify it as transient and swallow it correctly.

Add regression tests (429/500/503) verifying the instance stays
healthy when the probe returns a transient status.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 9, 2026

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 9, 2026

@coderabbitai re-review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

@polaz Sure! Triggering a full re-review of PR #409 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T15:31:04.797Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), the stale-init overwrite race in `doInitialize()` (lines ~117-163) is mitigated by identity guards: the outer `let state` binding + `this.instances.get(baseUrl) === state` check in the catch block (lines ~294-298) prevents a stale init from deleting a fresh state entry. A deeper fix using per-attempt init tokens (to prevent any intermediate writes from stale paths) requires significant refactoring across `doInitialize`/`clearInflight`/`reset` and is tracked in issue `#379`. Do not re-flag the stale-init race as unguarded in future reviews of this file.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:229-235
Timestamp: 2026-04-08T20:57:28.088Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `authenticatedTokenCheck` is called in BOTH the `performConnect` fast-path (after `quickHealthCheck`, when `connectionManager.isConnected()` is true) AND in `performHealthCheck`. This ensures that a `forceReconnect()` while the token is still revoked will fail auth and remain in `failed` state, not bounce back to `healthy`. Do not flag the RECONNECT path as bypassing token revalidation in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T07:33:01.038Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `refreshTokenScopes()` (around lines 624-651) applies a post-await identity guard: after `await detectTokenScopes(url)` returns, it re-reads `this.instances.get(url)` and compares it to the `state` reference captured before the await. If a concurrent `reinitialize()` or `reset()` replaced the entry, it returns `false` instead of writing stale scope info into the new state. This mirrors the identity guard in `doInitialize`'s catch block. Do not flag the lack of this guard as a stale-write bug in future reviews — it is already in place.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:578-603
Timestamp: 2026-03-31T10:09:51.540Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `handleStateChange` intentionally early-returns before calling `registry.updateConnectionStatus` when the state hasn't changed (`previousState === newState`). This is by design: `updateConnectionStatus` is idempotent and calling it on every periodic health-check poll (e.g., `healthy.checking → healthy.idle`) would add unnecessary overhead with no functional benefit. Do not flag this early-return ordering as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T11:32:28.956Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the first `handleStateChange` callback that fires before `InstanceRegistry` is initialized results in a silently dropped `updateConnectionStatus` call. This is documented in Known Limitations: the registry catches up on the next real state transition, and fixing the miss would require a pending-status queue — which is out of scope. Do not flag this first-transition registry miss as an unhandled bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:313-324
Timestamp: 2026-04-08T19:14:05.332Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `healthCheckErrorIsAuth` XState guard does NOT use `classifyError()` to detect auth failures from the authenticated health probe. Instead, it checks the error message directly (e.g., message starts with 'GitLab API error: 401' or 'GitLab API error: 403'). This bypass is intentional: `classifyError` maps HTTP 403 to 'permanent' (not 'auth'), which would cause 403 token-forbidden responses to fall through to `recordFailure` and stay in healthy/degraded. Both 401 and 403 from `/api/v4/user` are terminal auth failures that must route to the `failed` transition with no auto-reconnect. Do not flag this direct message-check as bypassing the error classification layer in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/handlers.ts:329-366
Timestamp: 2026-03-29T21:05:15.421Z
Learning: In `src/handlers.ts` (structured-world/gitlab-mcp), the early-return branch for `manage_context` when the instance is **unreachable** intentionally omits `healthMonitor.reportSuccess` / `reportError`, because in that path context tools (whoami, switch_instance) operate on local state only and never call the GitLab API. However, when the instance IS reachable/healthy, `manage_context` (e.g., `whoami`) does call the GitLab API and falls through the normal execution path including `reportSuccess`/`reportError` — this is correct and intentional. Do not flag this asymmetry as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:145-160
Timestamp: 2026-03-29T19:15:31.468Z
Learning: In structured-world/gitlab-mcp, the periodic health check in `src/services/HealthMonitor.ts` (`performHealthCheck`) is intentionally unauthenticated (HEAD /api/v4/version). Token revocation / authenticated health-check detection is intentionally out of scope for PR `#369` (startup resilience) and is tracked separately in issue `#370`. Do not flag the unauthenticated health probe as a bug in reviews of this file/PR.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:100-108
Timestamp: 2026-03-29T18:10:49.416Z
Learning: In structured-world/gitlab-mcp, the `ConnectionManager.isConnected()` / `getInstanceInfo()` singleton path in `HealthMonitor.ts` (`performConnect`) is intentional for single-instance mode. Multi-instance OAuth contexts use `InstanceRegistry` for per-URL state. Do not flag this singleton usage as a bug in future reviews; the two modes have separate code paths by design.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:724-732
Timestamp: 2026-04-02T10:46:41.853Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `isAnyInstanceHealthy()` intentionally includes the `'connecting'` state in its truthy set (alongside `'healthy'` and `'degraded'`). During startup all instances are in `'connecting'`, and excluding it would cause `tools/list` to return context-only tools for the entire initialization window (~5s), which is bad UX. The handler-level `isInstanceReachable()` check is the real runtime gate that prevents tool execution against unreachable instances. Do not flag the inclusion of `'connecting'` in `isAnyInstanceHealthy()` as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:666-669
Timestamp: 2026-04-02T10:24:42.437Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `resolveUrl(instanceUrl?: string)` method and the state-mutation methods that call it (`reportSuccess`, `reportError`, `forceReconnect`) intentionally accept an optional `instanceUrl` with a fallback to `GITLAB_BASE_URL`. This zero-arg fallback is by design for test ergonomics and is documented by the test "should use default URL for reportSuccess/reportError" (around line 716–718 of `tests/unit/services/HealthMonitor.test.ts`). All production callers in `src/handlers.ts` already pass an explicit `instanceUrl`. Do not flag the optional parameter or the GITLAB_BASE_URL fallback as unsafe or requiring a required-parameter refactor in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/ConnectionManager.ts:451-453
Timestamp: 2026-04-02T14:35:05.324Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `resolveState()` currently falls back to `this.currentInstanceUrl` when no `instanceUrl` is passed. Zero-arg callers in `src/services/ToolAvailability.ts` (around lines 339-357, 436-454) and `src/services/WidgetAvailability.ts` (around lines 110-126, 162-177) can therefore read another concurrent OAuth request's version/tier. These callers need to pass an explicit `instanceUrl` from `getGitLabApiUrlFromContext()` to avoid cross-request state leakage. This is tracked as issue `#379` (not fixed in PR `#369`). Do not flag resolveState itself as the sole issue — the callers need updating too.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T12:46:57.150Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor uses a dedicated `InitializationTimeoutError` class (around lines 152-166) to signal an init-timeout rejection, replacing prior fragile string-based error matching. `classifyError` in the catch block relies on this class identity to reliably distinguish init timeouts from other transient/auth/permanent failures. Do not suggest reverting to string-matching for timeout detection in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:215-216
Timestamp: 2026-03-30T21:18:53.194Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `quickHealthCheck` is intentionally a reachability-only probe — `response.status < 500` is the correct condition because any HTTP response (including 400/404) proves the server is responding. It is NOT an API validation check. Actual API correctness (correct base URL, valid endpoints, auth) is verified at tool execution level. Do not flag this threshold as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 380
File: src/services/HealthMonitor.ts:112-206
Timestamp: 2026-04-03T00:40:13.151Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor function (around lines 112–206) has a cognitive complexity of ~17 (SonarCloud limit: 15) due to three interleaved phases: fast-path health check, full initialization with timeout, and degraded-path reachability verification. The maintainer (polaz) has explicitly declined extracting these into helpers because all three phases share a single `deadline` variable that defines the unified startup budget — splitting would obscure this invariant. A code comment was added above `performConnect` explaining the intentional complexity. Do not re-flag this complexity issue in future reviews of this function.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T23:36:18.216Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor (around lines 184-196) skips the post-init health probe when the remaining budget is < 500ms. When the remaining time is near-zero, the instance is assumed reachable (since init already succeeded) to prevent a false `disconnected` transition caused by an artificially short probe timeout. Do not flag skipping the health probe in this near-zero budget path as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T08:14:19.833Z
Learning: In `src/utils/fetch.ts` (structured-world/gitlab-mcp), `skipAuth: true` suppresses BOTH `getAuthHeaders()` AND `loadCookieHeader()` in the internal `doFetch()` call. This ensures unauthenticated probes (e.g., HEAD /api/v4/version health checks) carry no credentials or session cookies. Do not flag missing cookie suppression for `skipAuth` paths in future reviews — it is already handled.
✅ Actions performed

Full review triggered.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 9, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

@polaz Sure! Triggering a full re-review of PR #409 now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T15:31:04.797Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), the stale-init overwrite race in `doInitialize()` (lines ~117-163) is mitigated by identity guards: the outer `let state` binding + `this.instances.get(baseUrl) === state` check in the catch block (lines ~294-298) prevents a stale init from deleting a fresh state entry. A deeper fix using per-attempt init tokens (to prevent any intermediate writes from stale paths) requires significant refactoring across `doInitialize`/`clearInflight`/`reset` and is tracked in issue `#379`. Do not re-flag the stale-init race as unguarded in future reviews of this file.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:229-235
Timestamp: 2026-04-08T20:57:28.088Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `authenticatedTokenCheck` is called in BOTH the `performConnect` fast-path (after `quickHealthCheck`, when `connectionManager.isConnected()` is true) AND in `performHealthCheck`. This ensures that a `forceReconnect()` while the token is still revoked will fail auth and remain in `failed` state, not bounce back to `healthy`. Do not flag the RECONNECT path as bypassing token revalidation in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T07:33:01.038Z
Learning: In `src/services/ConnectionManager.ts` (structured-world/gitlab-mcp), `refreshTokenScopes()` (around lines 624-651) applies a post-await identity guard: after `await detectTokenScopes(url)` returns, it re-reads `this.instances.get(url)` and compares it to the `state` reference captured before the await. If a concurrent `reinitialize()` or `reset()` replaced the entry, it returns `false` instead of writing stale scope info into the new state. This mirrors the identity guard in `doInitialize`'s catch block. Do not flag the lack of this guard as a stale-write bug in future reviews — it is already in place.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:578-603
Timestamp: 2026-03-31T10:09:51.540Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `handleStateChange` intentionally early-returns before calling `registry.updateConnectionStatus` when the state hasn't changed (`previousState === newState`). This is by design: `updateConnectionStatus` is idempotent and calling it on every periodic health-check poll (e.g., `healthy.checking → healthy.idle`) would add unnecessary overhead with no functional benefit. Do not flag this early-return ordering as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T11:32:28.956Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the first `handleStateChange` callback that fires before `InstanceRegistry` is initialized results in a silently dropped `updateConnectionStatus` call. This is documented in Known Limitations: the registry catches up on the next real state transition, and fixing the miss would require a pending-status queue — which is out of scope. Do not flag this first-transition registry miss as an unhandled bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 409
File: src/services/HealthMonitor.ts:313-324
Timestamp: 2026-04-08T19:14:05.332Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `healthCheckErrorIsAuth` XState guard does NOT use `classifyError()` to detect auth failures from the authenticated health probe. Instead, it checks the error message directly (e.g., message starts with 'GitLab API error: 401' or 'GitLab API error: 403'). This bypass is intentional: `classifyError` maps HTTP 403 to 'permanent' (not 'auth'), which would cause 403 token-forbidden responses to fall through to `recordFailure` and stay in healthy/degraded. Both 401 and 403 from `/api/v4/user` are terminal auth failures that must route to the `failed` transition with no auto-reconnect. Do not flag this direct message-check as bypassing the error classification layer in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/handlers.ts:329-366
Timestamp: 2026-03-29T21:05:15.421Z
Learning: In `src/handlers.ts` (structured-world/gitlab-mcp), the early-return branch for `manage_context` when the instance is **unreachable** intentionally omits `healthMonitor.reportSuccess` / `reportError`, because in that path context tools (whoami, switch_instance) operate on local state only and never call the GitLab API. However, when the instance IS reachable/healthy, `manage_context` (e.g., `whoami`) does call the GitLab API and falls through the normal execution path including `reportSuccess`/`reportError` — this is correct and intentional. Do not flag this asymmetry as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:145-160
Timestamp: 2026-03-29T19:15:31.468Z
Learning: In structured-world/gitlab-mcp, the periodic health check in `src/services/HealthMonitor.ts` (`performHealthCheck`) is intentionally unauthenticated (HEAD /api/v4/version). Token revocation / authenticated health-check detection is intentionally out of scope for PR `#369` (startup resilience) and is tracked separately in issue `#370`. Do not flag the unauthenticated health probe as a bug in reviews of this file/PR.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:100-108
Timestamp: 2026-03-29T18:10:49.416Z
Learning: In structured-world/gitlab-mcp, the `ConnectionManager.isConnected()` / `getInstanceInfo()` singleton path in `HealthMonitor.ts` (`performConnect`) is intentional for single-instance mode. Multi-instance OAuth contexts use `InstanceRegistry` for per-URL state. Do not flag this singleton usage as a bug in future reviews; the two modes have separate code paths by design.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:724-732
Timestamp: 2026-04-02T10:46:41.853Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `isAnyInstanceHealthy()` intentionally includes the `'connecting'` state in its truthy set (alongside `'healthy'` and `'degraded'`). During startup all instances are in `'connecting'`, and excluding it would cause `tools/list` to return context-only tools for the entire initialization window (~5s), which is bad UX. The handler-level `isInstanceReachable()` check is the real runtime gate that prevents tool execution against unreachable instances. Do not flag the inclusion of `'connecting'` in `isAnyInstanceHealthy()` as a bug in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/handlers.ts:448-457
Timestamp: 2026-03-31T13:18:57.386Z
Learning: In `src/handlers.ts` (structured-world/gitlab-mcp), the bootstrap `catch (initError)` block (inside the `try` that calls `connectionManager.initialize()` / `getClient()` / `ensureIntrospected()`) handles the case where `!connectionManager.isConnected(effectiveInstanceUrl)`. It calls `healthMonitor.reportError(effectiveInstanceUrl, initError)` (when `!timedOut`), logs the error, and then **returns** a `createConnectionFailedError(toolName, action, effectiveInstanceUrl, healthMonitor.getState(effectiveInstanceUrl))` structured `CONNECTION_FAILED` response (with `instance_url`, `reconnecting`, and `suggested_fix`) instead of throwing `"Bad Request: Server not initialized"`. Do not flag this bootstrap path as returning a generic error or as failing to report to HealthMonitor in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-02T23:36:18.216Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor (around lines 184-196) skips the post-init health probe when the remaining budget is < 500ms. When the remaining time is near-zero, the instance is assumed reachable (since init already succeeded) to prevent a false `disconnected` transition caused by an artificially short probe timeout. Do not flag skipping the health probe in this near-zero budget path as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 380
File: src/services/HealthMonitor.ts:112-206
Timestamp: 2026-04-03T00:40:13.151Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor function (around lines 112–206) has a cognitive complexity of ~17 (SonarCloud limit: 15) due to three interleaved phases: fast-path health check, full initialization with timeout, and degraded-path reachability verification. The maintainer (polaz) has explicitly declined extracting these into helpers because all three phases share a single `deadline` variable that defines the unified startup budget — splitting would obscure this invariant. A code comment was added above `performConnect` explaining the intentional complexity. Do not re-flag this complexity issue in future reviews of this function.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T12:46:57.150Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), the `performConnect` actor uses a dedicated `InitializationTimeoutError` class (around lines 152-166) to signal an init-timeout rejection, replacing prior fragile string-based error matching. `classifyError` in the catch block relies on this class identity to reliably distinguish init timeouts from other transient/auth/permanent failures. Do not suggest reverting to string-matching for timeout detection in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 369
File: src/services/HealthMonitor.ts:215-216
Timestamp: 2026-03-30T21:18:53.194Z
Learning: In `src/services/HealthMonitor.ts` (structured-world/gitlab-mcp), `quickHealthCheck` is intentionally a reachability-only probe — `response.status < 500` is the correct condition because any HTTP response (including 400/404) proves the server is responding. It is NOT an API validation check. Actual API correctness (correct base URL, valid endpoints, auth) is verified at tool execution level. Do not flag this threshold as incorrect in future reviews.

Learnt from: polaz
Repo: structured-world/gitlab-mcp PR: 0
File: :0-0
Timestamp: 2026-04-03T08:14:19.833Z
Learning: In `src/utils/fetch.ts` (structured-world/gitlab-mcp), `skipAuth: true` suppresses BOTH `getAuthHeaders()` AND `loadCookieHeader()` in the internal `doFetch()` call. This ensures unauthenticated probes (e.g., HEAD /api/v4/version health checks) carry no credentials or session cookies. Do not flag missing cookie suppression for `skipAuth` paths in future reviews — it is already handled.
✅ Actions performed

Full review triggered.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 10, 2026

@CodeRabbit full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

✅ Actions performed

Full review triggered.

@polaz polaz merged commit 5a74f2d into main Apr 11, 2026
23 of 24 checks passed
@polaz polaz deleted the feat/#370-feathealth-detect-token-revocation-via-authenticat branch April 11, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(health): detect token revocation via authenticated health checks

2 participants