Add resilience testing plan and 26 E2E tests#52
Closed
pinodeca wants to merge 3 commits into
Closed
Conversation
ec00a7c to
cecd59e
Compare
Contributor
Author
|
@copilot go ahead and implement the resilience testing plan proposed in this PR. |
Contributor
17 tasks
4a5bea0 to
899583c
Compare
899583c to
18362dc
Compare
Contributor
Author
|
@copilot #56 implemented some of the resilience test plan but wasn't tested and then was merged into branch pinodeca/breakit, but this PR's description wasn't updated. Can you update it? Regardless, continue implementing the resilience test plan. You may want to check the status of #56 - that's where we left off. |
Contributor
4157364 to
5787017
Compare
a620f3e to
c5507a0
Compare
Comprehensive plan to stress-test, chaos-test, and find edge-case bugs in pg_durable. Covers six testing categories: - Stress & Overload (concurrent instances, deep nesting, large results) - Bugs & Logical Errors (infinite loops, truthiness edge cases, recursive start) - Misuse & Unintended Usage (empty SQL, raw JSON, rapid polling) - Chaos / Fault Injection (kill worker, crash PG, drop+recreate extension) - Data Integrity & State Corruption (orphaned nodes, stuck instances, bloat) - Concurrency & Race Conditions (shared vars, concurrent start/cancel/signal) Includes existing coverage gap analysis and prioritized phased rollout.
c5507a0 to
c19436f
Compare
Contributor
Author
|
@copilot continue executing the plan in docs/resilience-testing.md |
Contributor
…ncurrency" Co-authored-by: pinodeca <32303022+pinodeca@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resilience Testing Plan + 26 E2E Tests for pg_durable
Comprehensive plan to systematically break pg_durable the way a real user might — finding resource limits, edge-case bugs, and failure modes not covered by existing E2E tests.
What's in this PR
Test plan: docs/resilience-testing.md — six testing categories with prioritized phased rollout.
26 new E2E tests (tests 38–63) covering Phases 1–5:
$varsubstitution (bug found)df.break()outside a loopdf.start()inside workflow (no guard found)df.status()pollspg_terminate_backendon BGW → PG auto-restarts in ≤5s; in-flight instance reaches terminal staterunningindefinitely — no idle timeout;df.cancel()is the only exitdf.instancesrow leavesdf.nodesrows intact (no FK cascade);df.status()returns gracefullydf.start(); all produce distinct IDs and completedf.varsdemonstratedTest runner improvements:
--keep-going/-kflag fortest-e2e-local.shto continue past failures with summary at end.postgres) override list — they requirepg_terminate_backend, direct RLS-bypassing table access, or dblink with postgres credentials.Key findings
$varsubstitution of empty/0-row results produces unquoted JSON → syntax errordf.loop()— infinite loops run foreverdf.start()— can spawn unbounded child instancesdf.break()outside loop returns break sentinel as result (not an error)df.instances/df.nodes— orphaned nodes accumulate silentlydf.varsis per-owner-global — concurrent sessions race on the same key (last writer wins)df.cancel()is the only escapeRemaining gaps (not addressed here)
df.status()lock contention) require a shell-level or timing-sensitive test harness beyond what SQL-only E2E tests support.