Skip to content

Commit b8f0dd5

Browse files
committed
feat(connectors): add 7 knowledge base connectors (Google Forms, Typeform, Azure DevOps, YouTube, JSM, S3, Sentry)
1 parent c620fdc commit b8f0dd5

21 files changed

Lines changed: 6515 additions & 10 deletions

File tree

.claude/commands/add-connector.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,24 @@ const response = await fetchWithRetry(url, { ... }, VALIDATE_RETRY_OPTIONS)
463463

464464
If `ExternalDocument.sourceUrl` is set, the sync engine stores it on the document record. Always construct the full URL (not a relative path).
465465

466+
## Capped or Incomplete Listings — `syncContext.listingCapped` (REQUIRED)
467+
468+
If `listDocuments` can ever return **less than the full source set** on a non-incremental sync — a `maxItems`/`maxDocuments`-style cap, or a transient per-item error that drops a still-existing document from the listing — it MUST set `syncContext.listingCapped = true` when that happens.
469+
470+
The sync engine reconciles deletions by comparing the full listing against stored documents: anything not seen is **hard-deleted** (sync-engine.ts, gated on `!syncContext?.listingCapped`). A truncated listing without this flag deletes every real document beyond the cap. This was the single most common bug found when auditing connectors — do not omit it.
471+
472+
```typescript
473+
if (hitLimit && syncContext) {
474+
syncContext.listingCapped = true
475+
}
476+
```
477+
478+
Rules:
479+
- Set it when a user-configured cap truncates the listing while more documents exist
480+
- Set it when a thrown error caused a still-present document to be skipped during listing
481+
- Do NOT set it when the source is genuinely exhausted (deleted documents must still reconcile)
482+
- Do NOT set it for intentional scope filters (e.g. a date cutoff) — out-of-scope documents should be reconciled normally
483+
466484
## Sync Engine Behavior (Do Not Modify)
467485

468486
The sync engine (`lib/knowledge/connectors/sync-engine.ts`) is connector-agnostic. It:
@@ -515,6 +533,7 @@ export const CONNECTOR_REGISTRY: ConnectorRegistry = {
515533
- `dependsOn` references selector field IDs (not `canonicalParamId`)
516534
- Dependency `canonicalParamId` values exist in `SELECTOR_CONTEXT_FIELDS`
517535
- [ ] `listDocuments` handles pagination with metadata-based content hashes
536+
- [ ] `syncContext.listingCapped = true` set whenever the listing is truncated (max-items cap or transient per-item error) — required to prevent the engine's deletion reconciliation from removing unseen documents
518537
- [ ] `contentDeferred: true` used if content requires per-doc API calls (file download, export, blocks fetch)
519538
- [ ] `contentHash` is metadata-based (not content-based) and identical between stub and `getDocument`
520539
- [ ] `sourceUrl` set on each ExternalDocument (full URL, not relative)

.claude/commands/validate-connector.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,13 @@ For each API endpoint the connector calls:
135135
- [ ] No off-by-one errors in pagination tracking
136136
- [ ] The connector does NOT hit known API pagination limits silently (e.g., HubSpot search 10k cap)
137137

138+
### Deletion-Reconciliation Safety (`listingCapped`) — CRITICAL
139+
The sync engine hard-deletes any stored document absent from a full listing. Audit every path where `listDocuments` can return less than the full source set:
140+
- [ ] `syncContext.listingCapped = true` is set when a `maxItems`-style cap truncates the listing while more documents exist
141+
- [ ] `listingCapped` is set when a transient per-item error drops a still-existing document from the listing
142+
- [ ] `listingCapped` is NOT set when the source is genuinely exhausted (deleted documents must reconcile) or for intentional scope filters (date cutoffs)
143+
This is the most common connector bug class — verify it explicitly against `sync-engine.ts`'s reconciliation gate.
144+
138145
### Pagination State Across Pages
139146
- [ ] `syncContext` is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.)
140147
- [ ] Cached state in `syncContext` is correctly initialized on first page and reused on subsequent pages

apps/docs/content/docs/en/knowledgebase/connectors.mdx

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,23 @@ Connectors continuously sync documents from external services into your knowledg
1414

1515
<Image src="/static/connectors/connectors-sources.png" alt="Connect Source picker showing a searchable list of available connectors including Airtable, Asana, Confluence, Discord, Dropbox, Evernote, Fireflies, GitHub, and Gmail" width={800} height={500} />
1616

17-
Sim ships with 30 built-in connectors:
17+
Sim ships with 49 built-in connectors:
1818

1919
| Category | Connectors |
2020
|----------|-----------|
21-
| **Productivity** | Notion, Confluence, Asana, Linear, Jira, Google Calendar, Google Sheets |
22-
| **Cloud Storage** | Google Drive, Dropbox, OneDrive, SharePoint |
23-
| **Documents** | Google Docs, WordPress, Webflow |
24-
| **Development** | GitHub |
25-
| **Communication** | Slack, Discord, Microsoft Teams, Reddit |
21+
| **Productivity** | Notion, Confluence, Asana, Linear, Jira, Jira Service Management, Monday, Google Calendar, Google Sheets, Google Forms, Typeform |
22+
| **Cloud Storage** | Google Drive, Dropbox, OneDrive, SharePoint, Amazon S3 |
23+
| **Documents** | Google Docs, WordPress, Webflow, DocuSign |
24+
| **Development** | GitHub, GitLab, Azure DevOps, Sentry |
25+
| **Communication** | Slack, Discord, Microsoft Teams, Reddit, YouTube |
2626
| **Email** | Gmail, Outlook |
2727
| **CRM** | HubSpot, Salesforce |
2828
| **Support** | Intercom, ServiceNow, Zendesk |
29+
| **Incident Management** | incident.io, Rootly |
2930
| **Data** | Airtable |
3031
| **Note-taking** | Evernote, Obsidian |
31-
| **Meetings** | Fireflies |
32+
| **Meetings** | Zoom, Gong, Grain, Granola, Fathom, Fireflies |
33+
| **Recruiting** | Greenhouse, Ashby |
3234

3335
## Adding a Connector
3436

@@ -41,13 +43,18 @@ From inside a knowledge base, click **+ New connector** in the top right to open
4143

4244
Most connectors use **OAuth** — select an existing credential from the dropdown or click **Connect new account** to authorize through the service. Tokens are refreshed automatically.
4345

44-
A few connectors use **API keys** instead:
46+
Other connectors use **API keys** or **personal access tokens** instead. The setup modal tells you which credential each connector expects — for example:
4547

4648
| Connector | Where to get the key |
4749
|-----------|---------------------|
4850
| **Evernote** | Developer Token (starts with `S=`) from your Evernote account settings |
4951
| **Obsidian** | Install the [Local REST API](https://github.com/coddingtonbear/obsidian-local-rest-api) plugin, then copy the key from its settings |
5052
| **Fireflies** | Generate from the Integrations page in your Fireflies account |
53+
| **Typeform** | Personal access token from your Typeform account settings |
54+
| **Azure DevOps** | Personal access token with Wiki (Read), Work Items (Read), and Code (Read) scopes |
55+
| **YouTube** | YouTube Data API key from the Google Cloud Console |
56+
| **Amazon S3** | Secret Access Key (the Access Key ID, region, and bucket are entered as config fields) |
57+
| **Sentry** | Auth token with `project:read` and `event:read` scopes |
5158

5259
<Callout type="info">
5360
If you rotate an API key in the external service, update it in Sim as well — OAuth tokens refresh automatically, but API keys do not.
@@ -63,6 +70,10 @@ Each connector has source-specific fields that control what gets synced. Example
6370
- **Notion** — sync an entire workspace, a specific database, or a single page tree
6471
- **GitHub** — specify a repository, branch, and optional file extension filter
6572
- **Confluence** — enter your Atlassian domain and optionally filter by space key or content type
73+
- **Azure DevOps** — choose what to sync (wiki pages, work items, repository files, or all), with optional work item type/state filters, a custom WIQL query, and repository/branch/path filters
74+
- **Amazon S3** — point at a bucket with an optional key prefix and a customizable file extension allowlist; S3-compatible stores (Cloudflare R2, MinIO) are supported via a custom endpoint
75+
- **YouTube** — sync a channel (by `@handle` or ID) or playlist, with an optional published-after date filter and the option to exclude Shorts
76+
- **Sentry** — filter issues by search query (e.g. `is:unresolved`), environment, and time window; self-hosted Sentry is supported via a custom host
6677
- **Obsidian** — provide your vault URL (`https://127.0.0.1:27124` by default) and optionally restrict to a folder path
6778
- **Fireflies** — optionally filter by host email or cap the number of transcripts synced
6879

@@ -188,5 +199,5 @@ You can add as many connectors as you need to a single knowledge base. Each mana
188199
{ question: "What happens when I delete a connector?", answer: "The connector is removed and future syncs stop. You're given the option to also delete all documents that were synced by that connector. If you don't check that option, they stay in the knowledge base as-is." },
189200
{ question: "What does the Disabled status mean?", answer: "After 10 consecutive full-sync failures, the connector is automatically disabled to stop retrying. Reconnect the OAuth account or click Resume to re-enable it." },
190201
{ question: "Do metadata tags count against a limit?", answer: "Yes. Tag slots are shared across all documents in a knowledge base — 17 slots total. Multiple connectors draw from the same pool, so plan accordingly if several connectors each auto-populate tags." },
191-
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key connectors (Evernote, Obsidian, Fireflies) need manual updates if you rotate the key in the external service." },
202+
{ question: "Do I need to re-authenticate connectors?", answer: "OAuth connectors refresh tokens automatically. API key and personal access token connectors need manual updates if you rotate the credential in the external service." },
192203
]} />

apps/docs/content/docs/en/mothership/knowledge.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ For knowledge bases that should stay current automatically, connectors sync cont
4949

5050
Connectors are configured through the knowledge base settings, not through Mothership chat. Once connected, all synced content is immediately searchable by Mothership and by any Agent block with the knowledge base attached.
5151

52-
Sim ships with 30 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.
52+
Sim ships with 49 built-in connectors, including Notion, Google Drive, Slack, GitHub, Confluence, HubSpot, Salesforce, Gmail, and more.
5353

5454
Examples of what you can sync:
5555

0 commit comments

Comments
 (0)