fix(notifier): detach ScheduleImmediate async work from request context#774
Conversation
…task The ScheduleImmediate gocron task ran asynchronously after the HTTP handler returned, but captured the request-scoped context. net/http cancels that context on handler return, so the async task's DB lookup and Campfire HTTP call both failed with "context canceled" — every immediate notification was silently dropped in production. Detach the async task from request-scope cancellation with context.WithoutCancel, preserving any request-scoped values (trace IDs, log fields). Add a regression test that cancels the parent context before the gocron task fires; without the fix, the test fails with "async notification job did not complete after parent context was cancelled". Also rewrite the existing ScheduleImmediate test to use a buffered channel instead of assert.Eventually over a shared variable, which was tripping the race detector.
There was a problem hiding this comment.
Pull request overview
This PR updates notifier immediate scheduling so asynchronous notification jobs are detached from HTTP request cancellation while preserving request-scoped context values, and adds regression coverage around parent context cancellation.
Changes:
- Uses
context.WithoutCancelforScheduleImmediateDB lookup and notification send. - Reworks the existing immediate notification test to use a channel.
- Adds a cancellation regression test for async notification jobs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
internal/notifier/notifier.go |
Detaches the async notification context from caller cancellation. |
internal/notifier/notifier_test.go |
Updates immediate notification testing and adds parent-cancellation coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address two findings from Copilot review on PR #774: 1. context.WithoutCancel drops the parent's deadline along with its cancellation, so the gocron task's DB query and Campfire HTTP call could run unbounded and stall scheduler shutdown. Wrap the detached context with context.WithTimeout(asyncCtx, 30s) inside the task and defer cancel(). 2. The TestNotifier_ScheduleImmediate_ParentContextCancelled regression test cancelled the parent context AFTER calling ScheduleImmediate, leaving a race: gocron could pick up the task and complete the in-memory pgxmock call before cancel() fired, letting the test pass on broken code. Cancel BEFORE scheduling so the task is guaranteed to observe an already-cancelled parent. Verified: 5/5 RED on the un-fixed implementation, GREEN with -race on the fix.
…ancelled header Rewrite the test's doc comment to describe the durable invariant — that ScheduleImmediate's async work must not be tied to the caller's context lifetime — and the production scenario it guards against, instead of referencing project history. Also explain why the parent ctx is cancelled before scheduling rather than after, so future readers do not misread the setup as a sequencing bug. Comment-only; no behavior change. Addresses Copilot review comment on PR #774 about the prior header contradicting the actual test sequence.
Per @notandy review on PR #774, swap the closure-captured `context.WithoutCancel(ctx)` pattern for gocron's native `func(ctx context.Context)` task signature. gocron passes a job-scoped context that is already independent of the caller's request lifetime, so WithoutCancel was redundant. The job context also gets cancelled when the scheduler shuts down, which gives in-flight tasks a cooperative cancellation signal that WithoutCancel did not — a small but real shutdown-semantics improvement on top of the existing 30s WithTimeout bound. The outer ctx parameter is preserved (as `_`) for API symmetry with other notifier methods and to keep the controller call site natural; verified via grep that no caller relies on request-scoped ctx values flowing through. Tests unchanged and still pass under -race -count=3, including the cancelled-parent-context invariant test.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
internal/notifier/notifier_test.go:185
- The test HTTP handler ignores JSON decode errors (
_ = Decode(...)), which can hide failures and make this regression test non-diagnostic. Prefer require.NoError/assert.NoError on Decode so a malformed request body fails the test immediately.
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var req CampfireRequest
_ = json.NewDecoder(r.Body).Decode(&req)
w.WriteHeader(http.StatusOK)
received <- req
}))
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Per Copilot review on PR #774, the two channel-driven test handlers in notifier_test.go silently dropped JSON decode errors with `_ =`, which would mask body-shape regressions: a malformed request would produce a zero-valued CampfireRequest, fail the downstream assertion on req.ProjectID, and surface as a confusing equality mismatch instead of "decode error". Use assert.NoError (not require.NoError) because the handler runs on a goroutine spawned by httptest.Server. require's t.FailNow calls runtime.Goexit, which only kills the calling goroutine and leaves the test main running on stale state; assert marks the test failed and lets the goroutine finish cleanly. Tests pass under -race -count=3.
Merging this branch will not change overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Address two findings from Copilot review on PR #774: 1. context.WithoutCancel drops the parent's deadline along with its cancellation, so the gocron task's DB query and Campfire HTTP call could run unbounded and stall scheduler shutdown. Wrap the detached context with context.WithTimeout(asyncCtx, 30s) inside the task and defer cancel(). 2. The TestNotifier_ScheduleImmediate_ParentContextCancelled regression test cancelled the parent context AFTER calling ScheduleImmediate, leaving a race: gocron could pick up the task and complete the in-memory pgxmock call before cancel() fired, letting the test pass on broken code. Cancel BEFORE scheduling so the task is guaranteed to observe an already-cancelled parent. Verified: 5/5 RED on the un-fixed implementation, GREEN with -race on the fix.
…ancelled header Rewrite the test's doc comment to describe the durable invariant — that ScheduleImmediate's async work must not be tied to the caller's context lifetime — and the production scenario it guards against, instead of referencing project history. Also explain why the parent ctx is cancelled before scheduling rather than after, so future readers do not misread the setup as a sequencing bug. Comment-only; no behavior change. Addresses Copilot review comment on PR #774 about the prior header contradicting the actual test sequence.
Per @notandy review on PR #774, swap the closure-captured `context.WithoutCancel(ctx)` pattern for gocron's native `func(ctx context.Context)` task signature. gocron passes a job-scoped context that is already independent of the caller's request lifetime, so WithoutCancel was redundant. The job context also gets cancelled when the scheduler shuts down, which gives in-flight tasks a cooperative cancellation signal that WithoutCancel did not — a small but real shutdown-semantics improvement on top of the existing 30s WithTimeout bound. The outer ctx parameter is preserved (as `_`) for API symmetry with other notifier methods and to keep the controller call site natural; verified via grep that no caller relies on request-scoped ctx values flowing through. Tests unchanged and still pass under -race -count=3, including the cancelled-parent-context invariant test.
Summary
Notifier.ScheduleImmediateschedules a gocron one-shot job that runs after the HTTP handler returns. It capturedparams.HTTPRequest.Context(), whichnet/httpcancels at handler return — so the async DB lookup and Campfire HTTP call both failed withcontext canceled. Every immediate endpoint notification was silently dropped in production.func(ctx context.Context)task signature so the task receives a job-scoped context from the scheduler. That context is independent of the caller's HTTP request lifetime AND is cancelled when the scheduler shuts down, so in-flight tasks short-circuit cleanly on shutdown.context.WithTimeoutso a stuck DB query or Campfire HTTP call cannot run unbounded and stall scheduler shutdown.async notification job did not complete after parent context was cancelled.ScheduleImmediatetest to use a buffered channel instead ofassert.Eventuallyover a shared variable, which was tripping-race.Why this matters
Found during review of #772. Affects every endpoint approval path that calls
ScheduleImmediate— the user-visible symptom is missing Campfire/email notifications with no error returned to the API caller (the handler had already responded 200).Review-driven evolution
Initial fix used
context.WithoutCancel(ctx)to detach cancellation while preserving request-scoped values. Per @notandy's review the implementation was simplified to gocron's job context, which already provides caller-detachment AND adds graceful shutdown cancellation thatWithoutCanceldid not. Verified via grep that no caller ininternal/controller/orinternal/middlewares/propagates request-scoped ctx values into the notifier task body, so the change is value-lossless.Test plan
go test -race -count=3 ./internal/notifier/...— passesTestNotifier_ScheduleImmediate_ParentContextCancelled) fails on main, passes here — RED-GREEN verifiedmake checklinters: shellcheck, golangci-lint (0 issues), go-licence-detector, addlicense, reuse — all green