Skip to content

Commit 1f90a00

Browse files
committed
test(scripts): stress tests for pre-gate idempotency claim
Four scenarios that the unit-test stubs and the cold-gate burst (04) don't exercise. All green against a live webapp with the claim system wired in. 16 — claimant-crash recovery. Planted "pending" claim externally, fired 5 same-key triggers (all polling), DEL'd the claim mid-poll. Verifies the retry-SETNX path: 1 waiter wins, 4 polling losers resolve to the same runId. 17 — stale-runId recovery. Claim resolves to a runId that exists in neither PG nor the buffer. IdempotencyKeyConcern logs a warn and falls through; the trigger creates a fresh run. Validates the "resolved-but-not-findable" branch. 18 — claim safety-net timeout. Long-lived "pending" claim with no publisher; same-key trigger polls until safetyNetMs elapses, returns 503. Validates the wait/poll budget caps. 19 — burst → drain → re-burst with the same key. First burst converges via the claim (drainer ON, materialises post-burst); second burst resolves via PG-findFirst (existing IdempotencyKeyConcern behaviour), bypassing the claim entirely. Validates that the new claim path doesn't break the existing PG-cache resolution that takes over once the run is in PG.
1 parent 0b85126 commit 1f90a00

4 files changed

Lines changed: 306 additions & 0 deletions

File tree

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#!/usr/bin/env bash
2+
# 16 — claimant-crash recovery. The trigger pipeline's try/catch must
3+
# release the claim so polling waiters can retry. We simulate by
4+
# planting a "pending" claim externally, firing N same-key triggers
5+
# (all polling), DEL-ing the claim mid-poll to simulate a release,
6+
# and verifying one of the waiters re-claims + succeeds.
7+
#
8+
# Required: drainer OFF + redis-cli.
9+
10+
source "$(dirname "$0")/00-lib.sh"
11+
12+
header "Claimant-crash recovery: release → waiter re-claim"
13+
14+
if [[ -z "${REDIS_CLI:-}" ]]; then
15+
if command -v redis-cli >/dev/null 2>&1; then REDIS_CLI=(redis-cli)
16+
elif docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^redis$'; then
17+
REDIS_CLI=(docker exec -i redis redis-cli)
18+
else fail "no redis-cli; set REDIS_CLI"; summary; fi
19+
else read -ra REDIS_CLI <<< "$REDIS_CLI"
20+
fi
21+
22+
KEY="challenge-crash-$(date +%s)-$RANDOM"
23+
CLAIM_KEY="mollifier:claim:${ENV_ID:?ENV_ID required}:$TASK_ID:$KEY"
24+
25+
# Pre-plant a "pending" claim so all incoming triggers will poll.
26+
"${REDIS_CLI[@]}" SET "$CLAIM_KEY" "pending" EX 60 >/dev/null
27+
info "planted pending claim at $CLAIM_KEY"
28+
29+
# Fire 5 same-key triggers in parallel — all should enter poll mode.
30+
WAITERS=$WORK/w
31+
mkdir -p "$WAITERS"
32+
for i in $(seq 1 5); do
33+
curl -s -o "$WAITERS/$i.json" -X POST \
34+
-H "Authorization: Bearer $API_KEY" \
35+
-H "Content-Type: application/json" \
36+
-d "{\"payload\":{\"i\":$i},\"options\":{\"idempotencyKey\":\"$KEY\"}}" \
37+
"$API_BASE/api/v1/tasks/$TASK_ID/trigger" &
38+
done
39+
40+
# After 1 second, simulate the claimant's release by DEL-ing the claim
41+
# key. Polling waiters should detect the absent key, retry SETNX, and
42+
# one of them should win + proceed.
43+
sleep 1
44+
"${REDIS_CLI[@]}" DEL "$CLAIM_KEY" >/dev/null
45+
info "released pending claim (DEL fired)"
46+
47+
wait
48+
49+
# Collect runIds.
50+
declare -a IDS=()
51+
for f in "$WAITERS"/*.json; do
52+
id=$(jq -r '.id // empty' "$f")
53+
if [[ -n "$id" ]]; then IDS+=( "$id" ); fi
54+
done
55+
UNIQUE=$(printf "%s\n" "${IDS[@]}" | sort -u)
56+
n=$(echo "$UNIQUE" | wc -l | tr -d ' ')
57+
58+
info "responses: ${#IDS[@]}, unique runIds: $n"
59+
echo "$UNIQUE" | head -3 | while read -r id; do info " $id"; done
60+
61+
if [[ "$n" == "1" ]]; then
62+
pass "all 5 waiters resolved to one runId after release"
63+
else
64+
fail "expected 1 unique runId, got $n — retry path broken?"
65+
fi
66+
67+
NOT_CACHED=$(jq -s 'map(select(.isCached == false)) | length' "$WAITERS"/*.json)
68+
if [[ "$NOT_CACHED" == "1" ]]; then
69+
pass "exactly one waiter became the new claimant (isCached:false)"
70+
else
71+
fail "expected 1 isCached:false response, got $NOT_CACHED"
72+
fi
73+
74+
summary
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/usr/bin/env bash
2+
# 17 — stale-runId recovery. The claim resolves to a runId that exists
3+
# in neither PG nor the buffer (e.g., claimant errored after publish, or
4+
# both stores expired). IdempotencyKeyConcern should detect this, log a
5+
# warn, and fall through to a fresh trigger rather than echoing the
6+
# dead runId.
7+
#
8+
# Required: drainer OFF + redis-cli.
9+
10+
source "$(dirname "$0")/00-lib.sh"
11+
12+
header "Stale-runId recovery: claim points at a ghost"
13+
14+
if [[ -z "${REDIS_CLI:-}" ]]; then
15+
if command -v redis-cli >/dev/null 2>&1; then REDIS_CLI=(redis-cli)
16+
elif docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^redis$'; then
17+
REDIS_CLI=(docker exec -i redis redis-cli)
18+
else fail "no redis-cli; set REDIS_CLI"; summary; fi
19+
else read -ra REDIS_CLI <<< "$REDIS_CLI"
20+
fi
21+
22+
KEY="challenge-stale-$(date +%s)-$RANDOM"
23+
CLAIM_KEY="mollifier:claim:${ENV_ID:?ENV_ID required}:$TASK_ID:$KEY"
24+
GHOST_ID="run_doesnotexist_$(date +%s)"
25+
26+
# Plant a claim that points at a non-existent runId.
27+
"${REDIS_CLI[@]}" SET "$CLAIM_KEY" "$GHOST_ID" EX 60 >/dev/null
28+
info "planted stale claim: $CLAIM_KEY -> $GHOST_ID"
29+
30+
# Fire a same-key trigger. IdempotencyKeyConcern's flow:
31+
# 1. claimOrAwait → returns { resolved, runId: ghost }
32+
# 2. PG findFirst(idempotencyKey=K) → miss (no row)
33+
# 3. findBufferedRunWithIdempotency → miss
34+
# 4. Log warn ("claim resolved but runId not findable"), fall through
35+
# 5. The trigger proceeds normally and SHOULD create a fresh new run
36+
api POST "/api/v1/tasks/$TASK_ID/trigger" \
37+
"{\"payload\":{\"x\":1},\"options\":{\"idempotencyKey\":\"$KEY\"}}"
38+
if ! last_status_ok; then
39+
fail "trigger returned $(cat "$WORK/last.status") body=$(last_body | head -c 200)"
40+
summary
41+
fi
42+
NEW_ID=$(last_body | jq -r '.id')
43+
NEW_CACHED=$(last_body | jq -r '.isCached')
44+
45+
if [[ "$NEW_ID" == "$GHOST_ID" ]]; then
46+
fail "trigger returned the ghost runId — fall-through broken"
47+
elif [[ "$NEW_CACHED" == "true" ]]; then
48+
fail "trigger returned isCached:true (id=$NEW_ID) — should be fresh"
49+
else
50+
pass "fresh runId returned: $NEW_ID (isCached:false)"
51+
fi
52+
53+
# Verify the new run is actually resolvable (not another ghost).
54+
api GET "/api/v3/runs/$NEW_ID"
55+
if last_status_ok; then
56+
pass "new runId is resolvable"
57+
else
58+
fail "new runId $(cat "$WORK/last.status")"
59+
fi
60+
61+
summary
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
#!/usr/bin/env bash
2+
# 18 — claim safety-net timeout. Plant a "pending" claim with a TTL
3+
# longer than the wait safety net (default 5s); fire a same-key trigger;
4+
# verify it polls for the safetyNet and returns 503 (not 200, not 5xx,
5+
# not a fresh trigger).
6+
#
7+
# Required: drainer OFF + redis-cli.
8+
9+
source "$(dirname "$0")/00-lib.sh"
10+
11+
header "Claim safety-net timeout"
12+
13+
if [[ -z "${REDIS_CLI:-}" ]]; then
14+
if command -v redis-cli >/dev/null 2>&1; then REDIS_CLI=(redis-cli)
15+
elif docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^redis$'; then
16+
REDIS_CLI=(docker exec -i redis redis-cli)
17+
else fail "no redis-cli; set REDIS_CLI"; summary; fi
18+
else read -ra REDIS_CLI <<< "$REDIS_CLI"
19+
fi
20+
21+
KEY="challenge-ttl-$(date +%s)-$RANDOM"
22+
CLAIM_KEY="mollifier:claim:${ENV_ID:?ENV_ID required}:$TASK_ID:$KEY"
23+
24+
# Plant "pending" with TTL=20s — comfortably outlives the 5s safety net.
25+
"${REDIS_CLI[@]}" SET "$CLAIM_KEY" "pending" EX 20 >/dev/null
26+
info "planted long-lived pending claim ($CLAIM_KEY, TTL=20s)"
27+
28+
# Fire a same-key trigger. Time the response.
29+
t0=$(date +%s)
30+
api POST "/api/v1/tasks/$TASK_ID/trigger" \
31+
"{\"payload\":{\"x\":1},\"options\":{\"idempotencyKey\":\"$KEY\"}}"
32+
t1=$(date +%s)
33+
elapsed=$((t1 - t0))
34+
status=$(cat "$WORK/last.status")
35+
36+
info "response status=$status, elapsed=${elapsed}s"
37+
info "body: $(last_body | head -c 200)"
38+
39+
if [[ "$status" == "503" ]]; then
40+
pass "returned 503 (safety net hit)"
41+
else
42+
fail "expected 503, got $status"
43+
fi
44+
45+
# Wait should be ~5s (safetyNetMs default). Accept [4, 8] to absorb
46+
# polling jitter and webapp overhead.
47+
if (( elapsed >= 4 && elapsed <= 8 )); then
48+
pass "wait time ${elapsed}s ≈ safetyNetMs (5s)"
49+
else
50+
fail "wait time ${elapsed}s outside [4, 8]s — safetyNet misconfigured?"
51+
fi
52+
53+
# Cleanup so other tests don't see stale pending.
54+
"${REDIS_CLI[@]}" DEL "$CLAIM_KEY" >/dev/null
55+
56+
summary
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/usr/bin/env bash
2+
# 19 — burst → drain → re-burst with the same idempotency key.
3+
# Verifies the new claim system doesn't *break* the existing
4+
# post-materialisation cached-hit path: once the buffered (or PG) winner
5+
# of the first burst is materialised into PG, the second burst's
6+
# triggers should resolve via IdempotencyKeyConcern's PG-findFirst
7+
# (existing behaviour), bypassing the claim entirely.
8+
#
9+
# Required: drainer ON.
10+
11+
source "$(dirname "$0")/00-lib.sh"
12+
13+
header "Burst → drain → re-burst (cross-store cached resolve)"
14+
15+
KEY="challenge-reburst-$(date +%s)-$RANDOM"
16+
info "shared idempotencyKey=$KEY"
17+
18+
# Burst 1 — cold gate, same-key triggers serialise through the claim.
19+
info "burst 1 — 20 same-key triggers"
20+
B1=$WORK/burst1
21+
mkdir -p "$B1"
22+
for i in $(seq 1 20); do
23+
curl -s -o "$B1/$i.json" -X POST \
24+
-H "Authorization: Bearer $API_KEY" \
25+
-H "Content-Type: application/json" \
26+
-d "{\"payload\":{\"i\":$i},\"options\":{\"idempotencyKey\":\"$KEY\"}}" \
27+
"$API_BASE/api/v1/tasks/$TASK_ID/trigger" &
28+
done
29+
wait
30+
31+
declare -a IDS1=()
32+
for f in "$B1"/*.json; do
33+
id=$(jq -r '.id // empty' "$f")
34+
if [[ -n "$id" ]]; then IDS1+=( "$id" ); fi
35+
done
36+
U1=$(printf "%s\n" "${IDS1[@]}" | sort -u)
37+
n1=$(echo "$U1" | wc -l | tr -d ' ')
38+
info "burst 1: ${#IDS1[@]} responses, $n1 unique runId(s)"
39+
if [[ "$n1" == "1" ]]; then
40+
pass "burst 1 converged on one runId via the claim"
41+
WINNER=$(echo "$U1" | head -1)
42+
info "winner runId: $WINNER"
43+
else
44+
fail "burst 1 produced $n1 unique runIds — claim path broken"
45+
summary
46+
fi
47+
48+
# Wait for the winner to materialise into PG (drainer must be ON).
49+
info "polling for materialisation (drainer must be ON)"
50+
deadline=$(($(date +%s) + 60))
51+
materialised=""
52+
while (( $(date +%s) < deadline )); do
53+
api GET "/api/v3/runs/$WINNER" >/dev/null
54+
if last_body | jq -e '.attempts // [] | length > 0' >/dev/null 2>&1; then
55+
materialised="yes"
56+
break
57+
fi
58+
status=$(last_body | jq -r '.status // empty')
59+
if [[ "$status" != "" && "$status" != "PENDING" && "$status" != "QUEUED" && "$status" != "DELAYED" ]]; then
60+
materialised="yes"
61+
break
62+
fi
63+
sleep 1
64+
done
65+
if [[ -z "$materialised" ]]; then
66+
fail "winner did not materialise within 60s — drainer not on?"
67+
summary
68+
fi
69+
pass "winner $WINNER materialised into PG"
70+
71+
# Burst 2 — same key. Should ALL resolve via PG-findFirst (existing
72+
# IdempotencyKeyConcern behaviour) without ever reaching the claim path.
73+
info "burst 2 — 20 same-key triggers (post-materialisation)"
74+
B2=$WORK/burst2
75+
mkdir -p "$B2"
76+
for i in $(seq 1 20); do
77+
curl -s -o "$B2/$i.json" -X POST \
78+
-H "Authorization: Bearer $API_KEY" \
79+
-H "Content-Type: application/json" \
80+
-d "{\"payload\":{\"i\":$i,\"phase\":2},\"options\":{\"idempotencyKey\":\"$KEY\"}}" \
81+
"$API_BASE/api/v1/tasks/$TASK_ID/trigger" &
82+
done
83+
wait
84+
85+
declare -a IDS2=()
86+
for f in "$B2"/*.json; do
87+
id=$(jq -r '.id // empty' "$f")
88+
if [[ -n "$id" ]]; then IDS2+=( "$id" ); fi
89+
done
90+
U2=$(printf "%s\n" "${IDS2[@]}" | sort -u)
91+
n2=$(echo "$U2" | wc -l | tr -d ' ')
92+
info "burst 2: ${#IDS2[@]} responses, $n2 unique runId(s)"
93+
94+
if [[ "$n2" == "1" ]]; then
95+
pass "burst 2 converged on one runId"
96+
else
97+
fail "burst 2 produced $n2 unique runIds — PG-cache resolution broken"
98+
fi
99+
100+
SHARED=$(echo "$U2" | head -1)
101+
if [[ "$SHARED" == "$WINNER" ]]; then
102+
pass "burst 2's runId matches burst 1's winner — cross-store dedup intact"
103+
else
104+
fail "burst 2 runId=$SHARED, burst 1 winner=$WINNER — they should match"
105+
fi
106+
107+
# Burst 2 should be ALL isCached:true (PG-findFirst hit).
108+
CACHED2=$(jq -s 'map(select(.isCached == true)) | length' "$B2"/*.json)
109+
if [[ "$CACHED2" == "20" ]]; then
110+
pass "all 20 burst-2 responses are isCached:true (PG cache hit, not claim)"
111+
else
112+
fail "burst 2 had $CACHED2/20 isCached:true responses"
113+
fi
114+
115+
summary

0 commit comments

Comments
 (0)