Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,15 @@ func TestSeiNode_StatefulSetRecreatedAfterDelete(t *testing.T) {
g.Expect(testCli.Get(testCtx, stsKey, sts)).To(Succeed())
g.Expect(testCli.Delete(testCtx, sts)).To(Succeed())

// Barrier: let the Owns watch fire its deletion reconcile before the
// spec patch below. Without it, the spec patch races the in-flight
// reconcile's optimistic-locked status flush → 409 → flake.
waitFor(t, func() bool {
s := &appsv1.StatefulSet{}
err := testCli.Get(testCtx, stsKey, s)
return apierrors.IsNotFound(err) || (err == nil && s.UID != originalUID)
}, "controller must observe the StatefulSet deletion before next mutation")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barrier passes immediately on NotFound, race unfixed

Medium Severity

The apierrors.IsNotFound(err) branch causes this barrier to pass on its very first poll, making it a no-op. testCli is a direct (non-cached) client, and the STS has no finalizers in envtest, so it's already gone from etcd when Delete returns. The IsNotFound branch always wins before the second branch (s.UID != originalUID) — which genuinely proves the controller finished its reconcile — can ever be reached. The Owns-watch-triggered reconcile is still async at that point, so the subsequent spec patch can still land between the controller's Get and its optimistic-locked flushStatus, producing the same 409 this barrier is meant to prevent.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 44e8710. Configure here.


// Bump the spec so the controller reconciles again (the SeiNode
// reconciler watches StatefulSet via Owns, but envtest sometimes
// races; a Spec edit triggers the predicate deterministically).
Expand Down Expand Up @@ -201,6 +210,19 @@ func TestSeiNode_StatefulSetImpostorReplaced(t *testing.T) {
cur.Status.StatefulSet.UID = "00000000-0000-0000-0000-000000000000"
g.Expect(testCli.Status().Update(testCtx, cur)).To(Succeed())

// Barrier: confirm the forged status is durable and any reconcile
// woken by Status.Update has drained before the spec patch below.
// Without it, back-to-back writes race the in-flight reconcile's
// optimistic-locked status flush → 409 → flake.
waitFor(t, func() bool {
latest := &seiv1alpha1.SeiNode{}
if err := testCli.Get(testCtx, key, latest); err != nil {
return false
}
return latest.Status.StatefulSet != nil &&
latest.Status.StatefulSet.UID == "00000000-0000-0000-0000-000000000000"
}, "Status.Update forging UID must be durable before next mutation")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impostor barrier trivially satisfied, risks new timeout

Medium Severity

Because testCli is a direct (non-cached) client and Status().Update is synchronous, this barrier's Get sees the forged UID on the very first poll and returns immediately — it never actually drains any reconcile woken by the status write. The original 409 race persists. Worse, if the predicate doesn't filter status-only updates and the controller is fast enough to process the forged UID before the first poll, it nils Status.StatefulSet (impostor-deletion branch), and the barrier's condition (UID == "00000000-…") can never become true again — introducing a new 30s timeout flake that didn't exist before.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 44e8710. Configure here.


// Trigger a spec-bumped reconcile so the controller picks up the
// forged mismatch. The Owns(StatefulSet) watch would fire on the
// subsequent Delete too, but a spec bump deterministically pokes
Expand Down
Loading