Skip to content

chore(tests): stop leaking channels on cleanup failure#232

Closed
nijeesh-stream wants to merge 1 commit into
masterfrom
nijeeshjoshy/chore-fix-channel-leak
Closed

chore(tests): stop leaking channels on cleanup failure#232
nijeesh-stream wants to merge 1 commit into
masterfrom
nijeeshjoshy/chore-fix-channel-leak

Conversation

@nijeesh-stream
Copy link
Copy Markdown
Contributor

Summary

Audit: the CI suite creates ~94 channels per run (~50 sync + ~44 async) across fixtures and explicit test creations. Cleanup historically leaked them three ways:

# Source Channels / run
1 channel fixture (scope="function", used by 37 sync + 31 async tests) 68
2 fellowship_of_the_ring fixture 2
3 test_channel.py explicit creates (lines 32, 43, 322) 3 sync + 3 async = 6
4 test_draft.py:88 / async test_draft.py:88 (channel2, no cleanup on failure) 1 sync + 1 async = 2
5 test_channel_batch_updater.py (4 tests × ch1+ch2, both suites) 16

Three concrete leak vectors:

  1. channel fixture used client.delete_channels([cid], hard_delete=True) — that returns a task_id and the fixture returns immediately. When the task queue is backed up (the same backend slowness that breaks test_delete_channels) the channel is still alive when the next test asserts on a clean app.
  2. Every cleanup path was except Exception: pass — failures invisible. CI logs show nothing when delete fails.
  3. test_query_drafts (sync + async) created channel2 but only cleaned it up on the happy path — any assertion failure mid-test leaked it forever.

Stale channels drift across runs and eventually break unrelated tests with stale-state assertions (e.g. len(...) == 3 got 15).

Fix

Fix Effect
A. Swap client.delete_channels(...)channel.delete(hard=True) in every fixture. Synchronous HTTP DELETE, no task_id race. Channel gone before fixture returns.
C. Log cleanup exceptions to stderr ([cleanup] channel <cid> failed: <kind>: <msg>). Failures visible in CI logs, don't fail the passing test.
D. New _purge_stale_test_channels session-scoped, autouse=True fixture queries channels tagged {"test": True, "language": "python"} at session start and end and hard-deletes them. Tag test_query_drafts's channel2 so the sweep catches it too. Self-healing across runs.

Test plan

  • uv run black — clean
  • CI: green on the matrix (3.8 / 3.9 / 3.10 / 3.11 / 3.12)
  • CI logs: [cleanup] lines only appear when something legitimately failed; on a clean run there should be at most one [cleanup] swept N leaked test channels line at session start (covering whatever the prior run left behind) and possibly one at session end (own-run strays, if any).

The CI test suite creates ~94 channels per run (~50 sync + ~44 async)
across fixtures and explicit cases, but cleanup historically ran
through three escape hatches that silently leaked channels into the
shared test app:

1. channel fixture used client.delete_channels([cid], hard_delete=True),
   which returns a task_id immediately. When the task queue is backed
   up (the same backend slowness that breaks test_delete_channels) the
   fixture says 'done' before the delete actually fires.
2. Every cleanup path swallowed exceptions: except Exception: pass.
   Failures invisible.
3. test_query_drafts (sync + async) created a second channel,
   channel2, and did not delete it on the failure path.

Stale channels drift across runs and eventually break unrelated tests
that query the shared app expecting clean state.

Three fixes:

* Swap client.delete_channels for the synchronous channel.delete(hard=True)
  HTTP DELETE in every fixture. No task_id race; the channel is gone
  before the fixture returns.
* Log cleanup exceptions to stderr ('[cleanup] channel <cid> failed:
  ...'). Visible in CI logs, doesn't fail the test.
* Add a session-scoped, autouse=True sweep that queries channels tagged
  {'test': True, 'language': 'python'} at start and end of the run and
  hard-deletes whatever is still hanging around. Self-healing across
  runs; tag test_query_drafts's channel2 so the sweep catches it too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant