schema: add partial index on channel_members.pubkey by tlongwell-block · Pull Request #359 · block/sprout

tlongwell-block · 2026-04-17T23:07:50Z

Problem

Every Nostr REQ (subscription) message calls get_accessible_channel_ids() as its first database operation after auth. This query filters channel_members by WHERE pubkey = $1 AND removed_at IS NULL — but the only index on the table is the primary key (channel_id, pubkey), which has channel_id first. PostgreSQL cannot use this index for pubkey-first lookups, so every subscription triggers a sequential scan of the entire table.

On staging right now:

Metric	Value
Sequential scans on `channel_members`	5.6 million
Total rows read by those scans	7.2 billion
Scan rate	~5/sec steady
Rows per scan	~1,284 (full table)
`channels` table seq scans (same pattern)	5.8 million

The table is small today (1,360 rows) so each scan completes in <0.3ms and fits in shared_buffers (99.99% cache hit ratio). But this is O(N) per subscription — as users and channels grow, it will degrade linearly and become a real bottleneck.

Symptoms observed

Users are seeing "Failed to refresh channel history after subscribing" and "Timed out while loading channel history" on staging. While investigating, we also found:

A rogue redis-cli MONITOR session that had been running for ~6 hours, accumulating a 73MB output buffer in a single client connection (Redis pod limit is 256Mi). Killed it — memory dropped from 71MB → 1.45MB instantly.
Datadog agent unreachable from the Istio sidecar — repeated 503 errors in envoy tracing logs adding latency overhead to every request through the mesh.

The index fix addresses the underlying database inefficiency; the Redis MONITOR issue was the acute trigger.

Solution

Add a partial index on channel_members.pubkey for active members:

CREATE INDEX idx_channel_members_pubkey ON channel_members (pubkey)
    WHERE removed_at IS NULL;

This covers the exact predicate used by all hot-path queries in sprout-db/src/channel.rs:

Function	Line	Query pattern
`get_accessible_channel_ids`	529	`WHERE cm.pubkey = $1 AND cm.removed_at IS NULL`
`channel_ids_for_pubkey`	531	`WHERE cm.pubkey = $1 AND cm.removed_at IS NULL`
`is_member`	491	`WHERE cm.channel_id = $1 AND cm.pubkey = $2 AND cm.removed_at IS NULL`
`get_member_role`	604	`WHERE channel_id = $1 AND pubkey = $2 AND removed_at IS NULL`
`list_accessible_channels`	722	`LEFT JOIN ... AND cm.pubkey = $1 AND cm.removed_at IS NULL`

Also used by DM lookups in sprout-db/src/dm.rs (lines 244-245, 266-267).

Why partial?

No queries in the codebase read removed members (removed_at IS NOT NULL)
Partial index is smaller and more cache-friendly
Matches the exact WHERE clause PostgreSQL needs to prove index applicability

Why not composite `(pubkey, channel_id)`?

A composite index would enable index-only scans for SELECT channel_id WHERE pubkey = $1, but that is an optional future optimization. The single-column partial index is sufficient to eliminate the seq scans and is the minimal correct fix.

Queries already covered by existing PK

Queries that filter on (channel_id, pubkey) — like get_member, remove_member, get_member_role — are already well-served by the PK index (channel_id, pubkey). No additional index needed for those.

Rollout

CREATE INDEX (not CONCURRENTLY) takes a brief write lock on channel_members. At 1,360 rows this is sub-millisecond and safe.
Verify with EXPLAIN ANALYZE after deploy that the planner picks the new index for get_accessible_channel_ids().
Pairs well with block-coder-tf-stacks#1101 which bumps pod resource limits (more CPU/memory headroom), but that PR addresses capacity while this one addresses efficiency.

The channel_members table only has a PK index on (channel_id, pubkey). Every query that looks up channels-for-a-user (WHERE pubkey = $1) does a sequential scan because the PK has channel_id first. get_accessible_channel_ids() runs on every REQ (subscription) message — it is the first thing the relay does after auth. On staging this has accumulated 5.6M seq scans reading 7.2B rows total (~5 scans/sec steady). Add a partial index on (pubkey) WHERE removed_at IS NULL, which covers the exact predicate used by the hot-path queries in sprout-db/channel.rs: - get_accessible_channel_ids (line 529) - channel_ids_for_pubkey (line 531) - is_member (line 491) - get_member_role (line 604) The table is small today (1,360 rows) so each scan is <0.3ms, but this is O(N) per subscription and will degrade linearly as users grow.

tlongwell-block requested a review from wesbillman as a code owner April 17, 2026 23:07

wesbillman approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema: add partial index on channel_members.pubkey#359

schema: add partial index on channel_members.pubkey#359
tlongwell-block wants to merge 1 commit intomainfrom
tlongwell/add-channel-members-pubkey-index

tlongwell-block commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tlongwell-block commented Apr 17, 2026

Problem

Symptoms observed

Solution

Why partial?

Why not composite (pubkey, channel_id)?

Queries already covered by existing PK

Rollout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why not composite `(pubkey, channel_id)`?