Skip to content

Add Phase 3 & 4 resilience tests: chaos, data integrity, and concurrency (tests 58–63)#66

Merged
pinodeca merged 2 commits into
pinodeca/breakitfrom
copilot/sub-pr-52
Mar 14, 2026
Merged

Add Phase 3 & 4 resilience tests: chaos, data integrity, and concurrency (tests 58–63)#66
pinodeca merged 2 commits into
pinodeca/breakitfrom
copilot/sub-pr-52

Conversation

Copilot AI commented Mar 14, 2026

Copy link
Copy Markdown
Contributor

Continues executing the resilience testing plan in docs/resilience-testing.md. Phases 1, 2, and 5 were already complete (tests 38–57). This PR adds Phases 3 and 4.

New tests (58–63)

Test Plan What it validates
58_kill_worker_mid_execution D1 pg_terminate_backend on the BGW → PG auto-restarts it in ≤5s (epoch sentinel change confirms); in-flight instance must reach a terminal state
59_stuck_instances E2/E3 Signal-waiting instance stays running indefinitely — no idle timeout; df.cancel() is the only exit
60_orphaned_nodes E1 Deleting a df.instances row leaves df.nodes rows intact (no FK cascade); df.status() returns gracefully
61_table_bloat E4/E5 Instance and node row counts grow proportionally after 10 runs; confirms zero automatic GC across df/duroxide schemas
62_concurrent_sessions F1 10 independent dblink connections each call df.start(); all produce distinct IDs and complete
63_shared_variable_race F4 Variable snapshot-at-start semantics verified; cross-session last-writer-wins overwrite on df.vars demonstrated and documented

New key findings

ID Finding Severity
F8 No FK between df.instances / df.nodes — orphaned nodes accumulate silently Design gap
F9 No GC for completed instances or duroxide history — unbounded table growth Design gap
F10 df.vars is per-owner-global — concurrent sessions race on the same key (last writer wins) Design gap
F11 No idle timeout for signal-waiting instances — df.cancel() is the only escape Design gap

Test runner

Tests 58, 60, 62, 63 added to the superuser (postgres) override list in scripts/test-e2e-local.sh — they require pg_terminate_backend, direct RLS-bypassing table access, or dblink with postgres credentials.

Remaining gaps (not addressed here)

D2 (PostgreSQL crash recovery), F2/F3 (signal/cancel vs. completion races), and F5 (concurrent df.status() lock contention) require a shell-level or timing-sensitive test harness beyond what SQL-only E2E tests support.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ncurrency"

Co-authored-by: pinodeca <32303022+pinodeca@users.noreply.github.com>
Copilot AI changed the title [WIP] Add resilience testing plan and 20 E2E tests Add Phase 3 & 4 resilience tests: chaos, data integrity, and concurrency (tests 58–63) Mar 14, 2026
Copilot AI requested a review from pinodeca March 14, 2026 21:29
@pinodeca pinodeca merged commit 97c6312 into pinodeca/breakit Mar 14, 2026
1 check passed
@pinodeca pinodeca deleted the copilot/sub-pr-52 branch March 14, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants