Skip to content

fix(k8s): handle ignored repo.Update errors in provisioner#5

Open
jackthepunished wants to merge 4 commits intomainfrom
fix/issue-310-k8s-update-errors
Open

fix(k8s): handle ignored repo.Update errors in provisioner#5
jackthepunished wants to merge 4 commits intomainfrom
fix/issue-310-k8s-update-errors

Conversation

@jackthepunished
Copy link
Copy Markdown
Owner

Summary

  • Closes Ignored errors from repo.Update() in K8s provisioner may cause silent state inconsistency poyrazK/thecloud#310. Four repo.Update() calls in internal/repositories/k8s/provisioner.go previously discarded their error with _ =, causing silent state inconsistency between in-memory cluster state and the database.
  • Provision now wraps and returns the persistence error after flipping status to Running, so callers know the state was not durably committed.
  • provisionControlPlane routes update failures (ControlPlaneIPs, encrypted kubeconfig) through failCluster so the cluster is consistently marked Failed.
  • failCluster logs the secondary update failure rather than silently dropping it, while still returning the original failure error to the caller.

Test plan

  • go build ./... clean
  • go test ./internal/repositories/k8s/... passes
  • New regression subtest TestFailCluster/UpdateFails_StillReturnsOriginalError verifies the original failure error is preserved when status persistence fails

jackthepunished and others added 4 commits April 29, 2026 22:49
The password reset flow logged the plaintext token alongside the user's
email at Debug level so it could be picked up during MVP testing. Anyone
with log access could mint a password reset and take over the account.

Replace the Debug log with an Info entry that records only the user_id
and reset-token row id. The plaintext token must reach the user via
email; once an EmailService is wired in it should be the sole consumer.

Closes poyrazK#293
…r injection

Object keys flowed straight into the Content-Disposition header via
fmt.Sprintf("attachment; filename=%s", key), so any key containing CRLF,
double quotes, or backslashes could split the response or inject
arbitrary headers. Path-bearing keys also leaked the full bucket path to
the client as the suggested filename.

Replace both call sites (authenticated download and presigned download)
with a single helper, contentDispositionAttachment, that:

  - reduces the key to its basename via path.Base
  - emits an ASCII-only `filename="..."` fallback with control bytes,
    non-ASCII bytes, quotes, and backslashes mapped to `_`
  - emits the full Unicode basename in `filename*=UTF-8''...` per RFC
    5987 with proper attr-char percent-encoding
  - guarantees CR/LF can never reach the wire

Add table-driven unit tests covering response-splitting, quote/backslash
injection, non-ASCII names, nested keys, and empty keys.

Closes poyrazK#225, poyrazK#226
Webhook delivery in NotifyService.deliverToWebhook used http.DefaultClient
and only inspected transport errors, so 4xx/5xx responses were silently
treated as successful deliveries. Subscribers returning failures (bad
endpoints, transient 5xx) had messages dropped without any log signal.

Check resp.StatusCode and emit a structured warn log on >=400 responses,
add a 15s timeout via a dedicated http.Client, drain the response body
before close, and stop discarding the error from NewRequestWithContext.

Closes poyrazK#338

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes poyrazK#310. Previously four repo.Update() calls in the kubeadm
provisioner discarded their error with `_ =`, hiding silent state
inconsistency when the database write failed:

- Provision: returning success after Status flip to Running.
- provisionControlPlane: persisting ControlPlaneIPs and KubeconfigEncrypted.
- failCluster: persisting ClusterStatusFailed.

Now:
- Provision wraps and returns the persistence error so the caller knows
  state was not durably committed.
- provisionControlPlane routes update failures through failCluster so
  the cluster is consistently marked Failed.
- failCluster logs (rather than silently drops) a secondary update
  failure while preserving the original failure error for the caller.

Adds a regression test ensuring failCluster surfaces the original error
even when the status persistence fails.
Copilot AI review requested due to automatic review settings April 29, 2026 20:23
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@jackthepunished has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 54 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d43e9cb6-8f72-4012-bd5e-812975d9b15f

📥 Commits

Reviewing files that changed from the base of the PR and between e2cc272 and d978ba6.

📒 Files selected for processing (7)
  • internal/core/services/notify.go
  • internal/core/services/notify_unit_test.go
  • internal/core/services/password_reset.go
  • internal/handlers/storage_handler.go
  • internal/handlers/storage_handler_content_disposition_test.go
  • internal/repositories/k8s/provisioner.go
  • internal/repositories/k8s/provisioner_extra_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-310-k8s-update-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 54 minutes and 54 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability and security across several areas by ensuring persistence errors are no longer silently ignored (notably in the k8s provisioner), and by hardening HTTP header generation and webhook delivery handling.

Changes:

  • k8s provisioner: stop discarding repo.Update() errors; route update failures through error returns / failCluster.
  • storage downloads: generate a safe RFC 6266/5987 Content-Disposition header to prevent header injection/response splitting.
  • notify + password reset: harden webhook delivery (request creation + response handling) and stop logging plaintext reset tokens.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/repositories/k8s/provisioner.go Propagates cluster persistence failures instead of ignoring them; logs secondary update failures in failCluster.
internal/repositories/k8s/provisioner_extra_test.go Adds regression coverage ensuring failCluster preserves the original error even if status persistence fails.
internal/handlers/storage_handler.go Introduces contentDispositionAttachment to safely set Content-Disposition on downloads.
internal/handlers/storage_handler_content_disposition_test.go Adds unit tests for sanitization/encoding behavior of Content-Disposition.
internal/core/services/notify.go Adds a bounded HTTP client, checks webhook request creation errors, drains response bodies, and logs delivery failures.
internal/core/services/notify_unit_test.go Adds regression coverage for webhook non-2xx behavior via httptest + log capture.
internal/core/services/password_reset.go Stops logging plaintext reset tokens; logs token issuance metadata instead.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"cluster_id", cluster.ID, "error", updateErr, "original_error", err)
}
p.logger.Error(msg, "cluster_id", cluster.ID, "error", err)
return fmt.Errorf("%s: %w", msg, err)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failCluster wraps err with %w even when err is nil (several call sites pass nil). This yields a confusing error message (e.g. %!w(<nil>)) and prevents reliable unwrapping. Consider returning errors.New(msg) (or fmt.Errorf(msg)) when err == nil, and only using %w when a real underlying error is present.

Suggested change
return fmt.Errorf("%s: %w", msg, err)
if err != nil {
return fmt.Errorf("%s: %w", msg, err)
}
return fmt.Errorf("%s", msg)

Copilot uses AI. Check for mistakes.
Comment on lines +309 to +314
if resp.StatusCode >= 400 {
s.logger.Warn("webhook delivery failed",
"endpoint", sub.Endpoint,
"subscription_id", sub.ID,
"status", resp.StatusCode)
return
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code checks resp.StatusCode >= 400, but the comment/test intention says “non-2xx”. This currently treats 3xx redirects as success. If the desired behavior is “2xx only”, update the condition to fail for any status outside [200, 299] (or adjust the wording if 3xx is intentionally allowed).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +56
name: "trailing slash falls back to download",
key: "folder/",
wantFilename: `filename="folder"`, // path.Base normalizes
wantFilenameStar: `filename*=UTF-8''folder`,
},
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case name/comment says “trailing slash falls back to download”, but the expected value is filename="folder" (because path.Base("folder/") returns "folder"). Rename the case/comment to match the actual behavior (or change the expectation if you truly want the fallback).

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +68
// contentDispositionAttachment builds a safe `Content-Disposition: attachment`
// header for a stored object.
//
// Object keys can contain path segments, non-ASCII characters, control
// characters, quotes, backslashes, or CRLF that — if interpolated naively —
// would either corrupt the header (HTTP response splitting) or let an attacker
// inject additional headers. The output therefore emits two parameters per
// RFC 6266:
//
// - `filename="..."` ASCII-only fallback for legacy clients. All bytes
// outside the safe printable range and the two
// characters that are special inside a quoted-string
// (`"` and `\`) are replaced with `_`.
// - `filename*=UTF-8''…` RFC 5987 percent-encoded form preserving the
// original Unicode basename for modern clients.
//
// `path.Base` is used to discard any path segments embedded in the key. If the
// resulting name is empty we fall back to "download".
func contentDispositionAttachment(key string) string {
name := path.Base(key)
if name == "." || name == "/" || name == "" {
name = "download"
}

return fmt.Sprintf(`attachment; filename="%s"; filename*=UTF-8''%s`,
asciiFilenameFallback(name), rfc5987Encode(name))
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focuses on k8s provisioner persistence errors, but this diff also introduces unrelated behavior changes in storage downloads (new Content-Disposition sanitization), notify webhook delivery (status handling + client), and password reset logging. To keep review/revert risk low, either update the PR description to include these additional changes and their motivation, or split them into separate PRs.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ignored errors from repo.Update() in K8s provisioner may cause silent state inconsistency

2 participants