fix: M:N capture_connectors table for connector provenance#85
Merged
Conversation
Two long-standing bugs shared one root cause: the captures table treated connector ownership as a single FK + a single string in metadata, but connector→capture is fundamentally M:N (one Reddit post can be both saved and upvoted; HN hot/saved overlap; github-stars/notifications overlap on the same repo). Symptom A: every connector capture had source_id=1 (claude) due to a hardcoded workaround. The schema lied about origin. Symptom B: per-connector item counts oscillated across syncs. The single metadata.connectorId field was clobbered on every UPSERT, so whichever connector synced last "won" the shared item, and the loser's count dropped by one until it synced again. Fix: introduce capture_connectors(capture_id, connector_id) M:N table, add a generic 'connector' source row, drop dead idx_captures_source. Migration v3 backfills M:N from existing metadata.connectorId, strips the field, and repoints connector captures to the new source row. Six query sites (sync-engine upsert/delete, main uninstall + count, CLI reset + count, ACP prompt examples) updated to JOIN through M:N.
This was referenced Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two long-standing bugs shared one root cause: the
capturestable treats connector ownership as a single FK (source_id) plus a single string inmetadata.connectorId, but connector → capture is fundamentally M:N:Symptom A — schema lies about origin
Every connector capture was inserted with
source_id = 1 (claude)via a hardcoded workaround insync-engine.ts, because thesourcestable only had four rows (claude, codex, opencli, gemini) and connectors didn't fit. Per-connector identity was shoved intometadata.connectorId. Empirically low-impact (no query JOINscapturestosourcesfor connector data), but a real schema smell and a trap for future maintainers.Symptom B — per-connector counts oscillate (user-visible)
metadata.connectorIdis single-valued. When two connectors legitimately share aplatform_id, the UPSERT path on(platform, platform_id)overwrites whichever connector was already there. Last sync wins. Reproduced live with Reddit:connectorIdflipped ownershipSame trap waiting for HN hot+saved, github-stars+github-notifications, and any future overlapping pair.
Design — why M:N over alternatives
Considered four options:
sourcesrowscapture_connectorsmetadata.connectorIds[]Option 2 was the most tempting alternative — drop a row per connector into
sources. Butsource_idis a single-valued FK, so the Reddit overlap post still can't carry bothreddit-savedandreddit-upvotedsimultaneously. Option 2 is a dead end for Symptom B; making it work would require either duplicating rows (breaking(platform, platform_id)dedup and doubling FTS cost) or never overwritingsource_id(which biases counts toward whichever connector synced first — still wrong).Only M:N can natively represent "this capture belongs to N connectors at once" without breaking dedup or duplicating index entries. Picked Option 3.
A pragmatic deviation from the original Option 3 sketch: instead of rebuilding the
capturestable to makesource_idnullable (which entangles FTS triggers, indexes, and FK references), I added a generic'connector'row tosourcesand repointed all connector captures at it. Same truth-value (the schema no longer claims a Reddit post came fromclaude) without touchingcaptures's table definition.Implementation
Schema
Plus a new
('connector', '<plugin>')row insources.Migration v3 (idempotent, transactional)
metadata.connectorIdconnectorIdfrommetadataviajson_removecaptures.source_idfor connector rows fromclaude→connectoridx_captures_source(no query ever used it)PRAGMA user_versiononly advances after the transaction commits, so a partial run retries cleanly. All four steps are idempotent (INSERT OR IGNORE,json_removeon already-stripped JSON is a no-op,UPDATEto current value is a no-op,DROP INDEX IF EXISTS).Sync engine
tagConnectorIdremoved entirely — metadata no longer carries provenanceupsertItemsnow takesconnectorIdand runsINSERT OR IGNORE INTO capture_connectorsafter both INSERT and UPDATE paths, so a capture re-synced by a second connector picks up an additional M:N row instead of overwritinggetSourceIdnow resolves'connector'instead of'claude'deleteConnectorItemsswitched to "drop this connector's M:N rows, then delete captures withsource='connector'that have no remaining M:N attribution" — preserves shared items legitimately owned by another connectorSix query sites updated
core/src/connectors/sync-engine.tsupsertItemswrites M:N;deleteConnectorItemsis M:N-awareapp/src/main/index.ts(uninstall)DELETE FROM captures WHERE platform = ?fallback (would nuke shared items in multi-connector world)app/src/main/index.ts(count)SELECT COUNT(*) FROM capture_connectors WHERE connector_id = ?cli/src/commands/connector-sync.ts(--reset)cli/src/commands/connector-sync.ts(final count)app/src/main/acp.tsTest helpers
createTestDBintest-helpers.tsupdated to mirror the new schema (extraconnectorsource row +capture_connectorstable). All 147 core tests pass.Verification — run against live DB
Pre-migration baseline (308 captures, all carrying
metadata.connectorId):user_version = 2, nocapture_connectorstableAfter dev startup (migration v3 ran):
user_version = 3,connectorsource row addedmetadata.connectorIdresiduesource='connector'idx_captures_sourcedropped,idx_capture_connectors_connectorcreatedSymptom B reproduction (Reddit, baseline matches the bug report exactly):
t3_1skjbg8 "Dark Fantasy Realms"carries bothreddit-savedandreddit-upvotedin M:N)Single-connector uninstall semantics (simulated in transaction, then rolled back):
reddit-savedM:N rows clearedreddit-upvotedreddit-upvotedcount unchangedFull npm-package uninstall (UI):
FTS sanity:
captures/captures_fts/captures_fts_trigramrow counts aligned post-migrationType-check + tests pass on
@spool/core,@spool/cli,@spool/app.Test plan
source_id = claudecaptures