You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(connectors): add 7 knowledge base connectors (Google Forms, Typeform, Azure DevOps, YouTube, JSM, S3, Sentry)
* fix(connectors): tighten listingCapped semantics per review (WIQL cap, batch omissions, cap-vs-exhaustion)
* fix(connectors): google-forms listingCapped must fire on slice regardless of hitLimit (404-null-filter gap)
* fix(connectors): s3 streaming size cap for chunked responses without content-length
* fix(connectors): ado byte-exact file content fetch, google-forms hash-poisoning on listing failure
* fix(connectors): ado auth-failure deletion guard, jsm last-page slice flag, google-forms response cap in hash
* fix(connectors): shared streaming size-cap reader for ado file hydration (promote from s3)
* fix(knowledge): flag incomplete listings at engine level when pagination is truncated
* fix(connectors): ado flags listing incomplete when a non-empty repo has no resolvable branch
* fix(knowledge): engine truncation flag is an absolute deletion block (fullSync cannot override); s3 byte-exact size fallback; ado tsdoc accuracy
* improvement(knowledge): extract shouldReconcileDeletions gate as tested pure function, tighten engine comments
* test(connectors): mapTags coverage for the 7 new connectors
* fix(connectors): ado probes past the wiql 20k cap before flagging; document custom-wiql full-listing behavior
* fix(connectors): ado flags partial repo trees when items listing emits a continuation token
* fix(connectors): ado discards foreign-phase cursors; google-forms scans all response pages for change detection
* fix(connectors): audit fixes across new connectors
- registry: register x connector (was dead code, never wired in)
- google-docs/google-drive/google-forms: gate deletion reconciliation on
Drive incompleteSearch; google-docs also now sets listingCapped on its
maxDocs cap path
- jsm: add read:jira-user scope so reporter resolves on requests
- gong: only set listingCapped on genuine truncation, not exact-cap
source exhaustion
- gitlab: issues phase switched to keyset pagination (removes ~50k
offset ceiling), matching the repo-tree phase
- grain: parallelize recording + transcript fetch in getDocument
- ashby: document updatedAt-based content-hash limitation for
notes/feedback change detection
- tests: mapTags coverage for x, granola, greenhouse, fathom, rootly
If `ExternalDocument.sourceUrl` is set, the sync engine stores it on the document record. Always construct the full URL (not a relative path).
465
465
466
+
## Capped or Incomplete Listings — `syncContext.listingCapped` (REQUIRED)
467
+
468
+
If `listDocuments` can ever return **less than the full source set** on a non-incremental sync — a `maxItems`/`maxDocuments`-style cap, or a transient per-item error that drops a still-existing document from the listing — it MUST set `syncContext.listingCapped = true` when that happens.
469
+
470
+
The sync engine reconciles deletions by comparing the full listing against stored documents: anything not seen is **hard-deleted** (sync-engine.ts, gated on `!syncContext?.listingCapped`). A truncated listing without this flag deletes every real document beyond the cap. This was the single most common bug found when auditing connectors — do not omit it.
471
+
472
+
```typescript
473
+
if (hitLimit&&syncContext) {
474
+
syncContext.listingCapped=true
475
+
}
476
+
```
477
+
478
+
Rules:
479
+
- Set it when a user-configured cap truncates the listing while more documents exist
480
+
- Set it when a thrown error caused a still-present document to be skipped during listing
481
+
- Do NOT set it when the source is genuinely exhausted (deleted documents must still reconcile)
482
+
- Do NOT set it for intentional scope filters (e.g. a date cutoff) — out-of-scope documents should be reconciled normally
483
+
466
484
## Sync Engine Behavior (Do Not Modify)
467
485
468
486
The sync engine (`lib/knowledge/connectors/sync-engine.ts`) is connector-agnostic. It:
-`dependsOn` references selector field IDs (not `canonicalParamId`)
516
534
- Dependency `canonicalParamId` values exist in `SELECTOR_CONTEXT_FIELDS`
517
535
-[ ]`listDocuments` handles pagination with metadata-based content hashes
536
+
-[ ]`syncContext.listingCapped = true` set whenever the listing is truncated (max-items cap or transient per-item error) — required to prevent the engine's deletion reconciliation from removing unseen documents
518
537
-[ ]`contentDeferred: true` used if content requires per-doc API calls (file download, export, blocks fetch)
519
538
-[ ]`contentHash` is metadata-based (not content-based) and identical between stub and `getDocument`
520
539
-[ ]`sourceUrl` set on each ExternalDocument (full URL, not relative)
The sync engine hard-deletes any stored document absent from a full listing. Audit every path where `listDocuments` can return less than the full source set:
140
+
-[ ]`syncContext.listingCapped = true` is set when a `maxItems`-style cap truncates the listing while more documents exist
141
+
-[ ]`listingCapped` is set when a transient per-item error drops a still-existing document from the listing
142
+
-[ ]`listingCapped` is NOT set when the source is genuinely exhausted (deleted documents must reconcile) or for intentional scope filters (date cutoffs)
143
+
This is the most common connector bug class — verify it explicitly against `sync-engine.ts`'s reconciliation gate.
144
+
138
145
### Pagination State Across Pages
139
146
-[ ]`syncContext` is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.)
140
147
-[ ] Cached state in `syncContext` is correctly initialized on first page and reused on subsequent pages
Copy file name to clipboardExpand all lines: apps/docs/content/docs/en/knowledgebase/connectors.mdx
+20-9Lines changed: 20 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,21 +14,23 @@ Connectors continuously sync documents from external services into your knowledg
14
14
15
15
<Imagesrc="/static/connectors/connectors-sources.png"alt="Connect Source picker showing a searchable list of available connectors including Airtable, Asana, Confluence, Discord, Dropbox, Evernote, Fireflies, GitHub, and Gmail"width={800}height={500} />
16
16
17
-
Sim ships with 30 built-in connectors:
17
+
Sim ships with 49 built-in connectors:
18
18
19
19
| Category | Connectors |
20
20
|----------|-----------|
21
-
|**Productivity**| Notion, Confluence, Asana, Linear, Jira, Google Calendar, Google Sheets |
22
-
|**Cloud Storage**| Google Drive, Dropbox, OneDrive, SharePoint |
23
-
|**Documents**| Google Docs, WordPress, Webflow |
24
-
|**Development**| GitHub |
25
-
|**Communication**| Slack, Discord, Microsoft Teams, Reddit |
21
+
|**Productivity**| Notion, Confluence, Asana, Linear, Jira, Jira Service Management, Monday, Google Calendar, Google Sheets, Google Forms, Typeform|
22
+
|**Cloud Storage**| Google Drive, Dropbox, OneDrive, SharePoint, Amazon S3|
23
+
|**Documents**| Google Docs, WordPress, Webflow, DocuSign|
@@ -41,13 +43,18 @@ From inside a knowledge base, click **+ New connector** in the top right to open
41
43
42
44
Most connectors use **OAuth** — select an existing credential from the dropdown or click **Connect new account** to authorize through the service. Tokens are refreshed automatically.
43
45
44
-
A few connectors use **API keys** instead:
46
+
Other connectors use **API keys**or **personal access tokens**instead. The setup modal tells you which credential each connector expects — for example:
45
47
46
48
| Connector | Where to get the key |
47
49
|-----------|---------------------|
48
50
|**Evernote**| Developer Token (starts with `S=`) from your Evernote account settings |
49
51
|**Obsidian**| Install the [Local REST API](https://github.com/coddingtonbear/obsidian-local-rest-api) plugin, then copy the key from its settings |
50
52
|**Fireflies**| Generate from the Integrations page in your Fireflies account |
53
+
|**Typeform**| Personal access token from your Typeform account settings |
54
+
|**Azure DevOps**| Personal access token with Wiki (Read), Work Items (Read), and Code (Read) scopes |
55
+
|**YouTube**| YouTube Data API key from the Google Cloud Console |
56
+
|**Amazon S3**| Secret Access Key (the Access Key ID, region, and bucket are entered as config fields) |
57
+
|**Sentry**| Auth token with `project:read` and `event:read` scopes |
51
58
52
59
<Callouttype="info">
53
60
If you rotate an API key in the external service, update it in Sim as well — OAuth tokens refresh automatically, but API keys do not.
@@ -63,6 +70,10 @@ Each connector has source-specific fields that control what gets synced. Example
63
70
-**Notion** — sync an entire workspace, a specific database, or a single page tree
64
71
-**GitHub** — specify a repository, branch, and optional file extension filter
65
72
-**Confluence** — enter your Atlassian domain and optionally filter by space key or content type
73
+
-**Azure DevOps** — choose what to sync (wiki pages, work items, repository files, or all), with optional work item type/state filters, a custom WIQL query, and repository/branch/path filters
74
+
-**Amazon S3** — point at a bucket with an optional key prefix and a customizable file extension allowlist; S3-compatible stores (Cloudflare R2, MinIO) are supported via a custom endpoint
75
+
-**YouTube** — sync a channel (by `@handle` or ID) or playlist, with an optional published-after date filter and the option to exclude Shorts
76
+
-**Sentry** — filter issues by search query (e.g. `is:unresolved`), environment, and time window; self-hosted Sentry is supported via a custom host
66
77
-**Obsidian** — provide your vault URL (`https://127.0.0.1:27124` by default) and optionally restrict to a folder path
67
78
-**Fireflies** — optionally filter by host email or cap the number of transcripts synced
68
79
@@ -188,5 +199,5 @@ You can add as many connectors as you need to a single knowledge base. Each mana
188
199
{ question: "What happens when I delete a connector?", answer: "The connector is removed and future syncs stop. You're given the option to also delete all documents that were synced by that connector. If you don't check that option, they stay in the knowledge base as-is." },
189
200
{ question: "What does the Disabled status mean?", answer: "After 10 consecutive full-sync failures, the connector is automatically disabled to stop retrying. Reconnect the OAuth account or click Resume to re-enable it." },
190
201
{ question: "Do metadata tags count against a limit?", answer: "Yes. Tag slots are shared across all documents in a knowledge base — 17 slots total. Multiple connectors draw from the same pool, so plan accordingly if several connectors each auto-populate tags." },
191
-
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key connectors (Evernote, Obsidian, Fireflies) need manual updates if you rotate the key in the external service." },
202
+
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key and personal access token connectors need manual updates if you rotate the credential in the external service." },
Copy file name to clipboardExpand all lines: apps/docs/content/docs/en/mothership/knowledge.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ For knowledge bases that should stay current automatically, connectors sync cont
49
49
50
50
Connectors are configured through the knowledge base settings, not through Mothership chat. Once connected, all synced content is immediately searchable by Mothership and by any Agent block with the knowledge base attached.
51
51
52
-
Sim ships with 30 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.
52
+
Sim ships with 49 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.
0 commit comments