fix(k8s): handle ignored repo.Update errors in provisioner#5
fix(k8s): handle ignored repo.Update errors in provisioner#5jackthepunished wants to merge 4 commits intomainfrom
Conversation
The password reset flow logged the plaintext token alongside the user's email at Debug level so it could be picked up during MVP testing. Anyone with log access could mint a password reset and take over the account. Replace the Debug log with an Info entry that records only the user_id and reset-token row id. The plaintext token must reach the user via email; once an EmailService is wired in it should be the sole consumer. Closes poyrazK#293
…r injection
Object keys flowed straight into the Content-Disposition header via
fmt.Sprintf("attachment; filename=%s", key), so any key containing CRLF,
double quotes, or backslashes could split the response or inject
arbitrary headers. Path-bearing keys also leaked the full bucket path to
the client as the suggested filename.
Replace both call sites (authenticated download and presigned download)
with a single helper, contentDispositionAttachment, that:
- reduces the key to its basename via path.Base
- emits an ASCII-only `filename="..."` fallback with control bytes,
non-ASCII bytes, quotes, and backslashes mapped to `_`
- emits the full Unicode basename in `filename*=UTF-8''...` per RFC
5987 with proper attr-char percent-encoding
- guarantees CR/LF can never reach the wire
Add table-driven unit tests covering response-splitting, quote/backslash
injection, non-ASCII names, nested keys, and empty keys.
Closes poyrazK#225, poyrazK#226
Webhook delivery in NotifyService.deliverToWebhook used http.DefaultClient and only inspected transport errors, so 4xx/5xx responses were silently treated as successful deliveries. Subscribers returning failures (bad endpoints, transient 5xx) had messages dropped without any log signal. Check resp.StatusCode and emit a structured warn log on >=400 responses, add a 15s timeout via a dedicated http.Client, drain the response body before close, and stop discarding the error from NewRequestWithContext. Closes poyrazK#338 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes poyrazK#310. Previously four repo.Update() calls in the kubeadm provisioner discarded their error with `_ =`, hiding silent state inconsistency when the database write failed: - Provision: returning success after Status flip to Running. - provisionControlPlane: persisting ControlPlaneIPs and KubeconfigEncrypted. - failCluster: persisting ClusterStatusFailed. Now: - Provision wraps and returns the persistence error so the caller knows state was not durably committed. - provisionControlPlane routes update failures through failCluster so the cluster is consistently marked Failed. - failCluster logs (rather than silently drops) a secondary update failure while preserving the original failure error for the caller. Adds a regression test ensuring failCluster surfaces the original error even when the status persistence fails.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 54 minutes and 54 seconds.Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves reliability and security across several areas by ensuring persistence errors are no longer silently ignored (notably in the k8s provisioner), and by hardening HTTP header generation and webhook delivery handling.
Changes:
- k8s provisioner: stop discarding
repo.Update()errors; route update failures through error returns /failCluster. - storage downloads: generate a safe RFC 6266/5987
Content-Dispositionheader to prevent header injection/response splitting. - notify + password reset: harden webhook delivery (request creation + response handling) and stop logging plaintext reset tokens.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/repositories/k8s/provisioner.go | Propagates cluster persistence failures instead of ignoring them; logs secondary update failures in failCluster. |
| internal/repositories/k8s/provisioner_extra_test.go | Adds regression coverage ensuring failCluster preserves the original error even if status persistence fails. |
| internal/handlers/storage_handler.go | Introduces contentDispositionAttachment to safely set Content-Disposition on downloads. |
| internal/handlers/storage_handler_content_disposition_test.go | Adds unit tests for sanitization/encoding behavior of Content-Disposition. |
| internal/core/services/notify.go | Adds a bounded HTTP client, checks webhook request creation errors, drains response bodies, and logs delivery failures. |
| internal/core/services/notify_unit_test.go | Adds regression coverage for webhook non-2xx behavior via httptest + log capture. |
| internal/core/services/password_reset.go | Stops logging plaintext reset tokens; logs token issuance metadata instead. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "cluster_id", cluster.ID, "error", updateErr, "original_error", err) | ||
| } | ||
| p.logger.Error(msg, "cluster_id", cluster.ID, "error", err) | ||
| return fmt.Errorf("%s: %w", msg, err) |
There was a problem hiding this comment.
failCluster wraps err with %w even when err is nil (several call sites pass nil). This yields a confusing error message (e.g. %!w(<nil>)) and prevents reliable unwrapping. Consider returning errors.New(msg) (or fmt.Errorf(msg)) when err == nil, and only using %w when a real underlying error is present.
| return fmt.Errorf("%s: %w", msg, err) | |
| if err != nil { | |
| return fmt.Errorf("%s: %w", msg, err) | |
| } | |
| return fmt.Errorf("%s", msg) |
| if resp.StatusCode >= 400 { | ||
| s.logger.Warn("webhook delivery failed", | ||
| "endpoint", sub.Endpoint, | ||
| "subscription_id", sub.ID, | ||
| "status", resp.StatusCode) | ||
| return |
There was a problem hiding this comment.
The code checks resp.StatusCode >= 400, but the comment/test intention says “non-2xx”. This currently treats 3xx redirects as success. If the desired behavior is “2xx only”, update the condition to fail for any status outside [200, 299] (or adjust the wording if 3xx is intentionally allowed).
| name: "trailing slash falls back to download", | ||
| key: "folder/", | ||
| wantFilename: `filename="folder"`, // path.Base normalizes | ||
| wantFilenameStar: `filename*=UTF-8''folder`, | ||
| }, |
There was a problem hiding this comment.
This test case name/comment says “trailing slash falls back to download”, but the expected value is filename="folder" (because path.Base("folder/") returns "folder"). Rename the case/comment to match the actual behavior (or change the expectation if you truly want the fallback).
| // contentDispositionAttachment builds a safe `Content-Disposition: attachment` | ||
| // header for a stored object. | ||
| // | ||
| // Object keys can contain path segments, non-ASCII characters, control | ||
| // characters, quotes, backslashes, or CRLF that — if interpolated naively — | ||
| // would either corrupt the header (HTTP response splitting) or let an attacker | ||
| // inject additional headers. The output therefore emits two parameters per | ||
| // RFC 6266: | ||
| // | ||
| // - `filename="..."` ASCII-only fallback for legacy clients. All bytes | ||
| // outside the safe printable range and the two | ||
| // characters that are special inside a quoted-string | ||
| // (`"` and `\`) are replaced with `_`. | ||
| // - `filename*=UTF-8''…` RFC 5987 percent-encoded form preserving the | ||
| // original Unicode basename for modern clients. | ||
| // | ||
| // `path.Base` is used to discard any path segments embedded in the key. If the | ||
| // resulting name is empty we fall back to "download". | ||
| func contentDispositionAttachment(key string) string { | ||
| name := path.Base(key) | ||
| if name == "." || name == "/" || name == "" { | ||
| name = "download" | ||
| } | ||
|
|
||
| return fmt.Sprintf(`attachment; filename="%s"; filename*=UTF-8''%s`, | ||
| asciiFilenameFallback(name), rfc5987Encode(name)) | ||
| } |
There was a problem hiding this comment.
The PR title/description focuses on k8s provisioner persistence errors, but this diff also introduces unrelated behavior changes in storage downloads (new Content-Disposition sanitization), notify webhook delivery (status handling + client), and password reset logging. To keep review/revert risk low, either update the PR description to include these additional changes and their motivation, or split them into separate PRs.
Summary
repo.Update()calls ininternal/repositories/k8s/provisioner.gopreviously discarded their error with_ =, causing silent state inconsistency between in-memory cluster state and the database.Provisionnow wraps and returns the persistence error after flipping status to Running, so callers know the state was not durably committed.provisionControlPlaneroutes update failures (ControlPlaneIPs, encrypted kubeconfig) throughfailClusterso the cluster is consistently marked Failed.failClusterlogs the secondary update failure rather than silently dropping it, while still returning the original failure error to the caller.Test plan
go build ./...cleango test ./internal/repositories/k8s/...passesTestFailCluster/UpdateFails_StillReturnsOriginalErrorverifies the original failure error is preserved when status persistence fails