feat(specialists): add NotionLibrarian (mirrors LinearLibrarian)#62
feat(specialists): add NotionLibrarian (mirrors LinearLibrarian)#62khaliqgant merged 2 commits intomainfrom
Conversation
Production trigger: cloud has no Notion specialist at all (the agent card advertises only pr_investigation, github.enumerate, linear.enumerate). Slack DM "do you see any info about our investors in notion?" returned the canned "could not complete" reply because there's no specialist for sage to delegate to. This is the upstream half. Cloud-side wiring (notion-api-client + notion-api-fallback + /api/v1/notion/query proxy + notion-specialist-agentic) lands in fix-notion-specialist-apifallback workflow against the cloud repo after this PR is merged and republished. Mirrors LinearLibrarian shape: NotionLibrarianAdapter + createNotionLibrarian factory + types + tests. No GitHubLibrarian-style investigator (Notion doesn't have a clear investigation analog yet — add later if needed). Tests + typecheck green (33/33 specialists tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df92261785
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const filters = params.filters ?? {}; | ||
|
|
||
| for (const key of NOTION_FILTER_KEYS) { | ||
| for (const value of filters[key] ?? []) parts.push(`${key}:${value}`); |
There was a problem hiding this comment.
Preserve spaces when serializing Notion filter values
enumerateNotion() builds an instruction string by appending raw key:value tokens, but values are not quoted/escaped. Any multi-word filter (for example database:Product Roadmap or title:Launch Plan) is split by whitespace during parsing, so only the first word is retained as the filter and the rest is treated as free text. This causes common Notion filters to miss valid entries and also sends incorrect filters to apiFallback.
Useful? React with 👍 / 👎.
…seQuery roundtrip Codex P1 review on PR #62: enumerateNotion built `key:value` tokens by direct concatenation, but the shared parseQuery splits the resulting instruction on /\s+/. Any whitespace-bearing value (e.g. `database:Product Roadmap`) parsed as filter `database:Product` (truncated) plus stray text `Roadmap` — silently mangling the filter so it neither matched the intended Notion content nor reached the apiFallback in the expected shape. Fix: in buildEnumerationInstruction, append whitespace-bearing values as bare text tokens instead of as `key:value`. The engine's inferEnumerationFilters can then re-extract them from text via its existing pattern matchers, OR the apiFallback can use the natural-language text as additional context. Single-word values continue to use `key:value`. Single regression test asserting enumerateNotion completes cleanly with multi-word `database` and `title` filters and dispatches a VFS call — proving the mangled-token codepath is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if (/\s/.test(value)) { | ||
| parts.push(value); |
There was a problem hiding this comment.
🔴 Multi-word filter values silently dropped by buildEnumerationInstruction
When enumerateNotion receives filter values containing whitespace (e.g., { database: ['Product Roadmap'], title: ['Launch Plan'] }), buildEnumerationInstruction pushes them as bare text without a key prefix (line 139: parts.push(value)). The instruction becomes "investor info Product Roadmap Launch Plan". Downstream, inferEnumerationFilters can only recover filter values via either key:value tokens (handled by inferExplicitFilters) or natural-language cue words like database "..." or title "...". Since the bare text has no cue word prefix, none of the cue patterns at librarian.ts:161-178 fire, and the filters are silently lost — the query degrades to an unfiltered text search.
The comment on lines 134–137 claims inferEnumerationFilters can "pattern-match them downstream," but the cue-based regex patterns require the key name to precede the value (e.g., /\bdatabase\s+.../), which bare text lacks.
Suggested approach
Instead of pushing bare text, emit the filter value with its key as a cue-word prefix and quote the value, e.g., database "Product Roadmap". The existing cue patterns in inferEnumerationFilters already support quoted values: /\bdatabase\s+(?:"([^"]+)"|...)/i would capture Product Roadmap correctly from database "Product Roadmap".
| if (/\s/.test(value)) { | |
| parts.push(value); | |
| if (/\s/.test(value)) { | |
| parts.push(`${key} "${value}"`); |
Was this helpful? React with 👍 or 👎 to provide feedback.
| const match = /\/notion\/(pages|databases|blocks|comments)\/[^/]+(?:\.json)?$/i.exec(path); | ||
| if (match?.[1] === 'pages') return 'page'; | ||
| if (match?.[1] === 'databases') return 'database'; | ||
| if (match?.[1] === 'blocks') return 'block'; | ||
| if (match?.[1] === 'comments') return 'comment'; | ||
| return undefined; |
There was a problem hiding this comment.
🟡 Case-insensitive regex with case-sensitive comparison in collectionItemTypeFromPath
The regex at librarian.ts:337 uses the i flag (/\/notion\/(pages|databases|blocks|comments)\/[^/]+(?:\.json)?$/i), making it match paths with non-lowercase collection segments (e.g., /notion/Pages/foo.json). However, the captured group match?.[1] preserves the original case from the input, and the subsequent comparisons use strict lowercase equality (match?.[1] === 'pages'). If a path like /notion/Pages/foo.json is encountered, the regex matches but the comparison fails, returning undefined instead of 'page'. The Linear (librarian.ts:294) and GitHub (librarian.ts:197) adapters correctly omit the i flag, making the Notion adapter inconsistent.
| const match = /\/notion\/(pages|databases|blocks|comments)\/[^/]+(?:\.json)?$/i.exec(path); | |
| if (match?.[1] === 'pages') return 'page'; | |
| if (match?.[1] === 'databases') return 'database'; | |
| if (match?.[1] === 'blocks') return 'block'; | |
| if (match?.[1] === 'comments') return 'comment'; | |
| return undefined; | |
| const match = /\/notion\/(pages|databases|blocks|comments)\/[^/]+(?:\.json)?$/i.exec(path); | |
| const collection = match?.[1]?.toLowerCase(); | |
| if (collection === 'pages') return 'page'; | |
| if (collection === 'databases') return 'database'; | |
| if (collection === 'blocks') return 'block'; | |
| if (collection === 'comments') return 'comment'; | |
| return undefined; |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds
NotionLibrarianto@agent-assistant/specialists, mirroring the existing LinearLibrarian shape.packages/specialists/src/notion/types.tspackages/specialists/src/notion/librarian.tswith NotionLibrarianAdapter + createNotionLibrarian factorypackages/specialists/src/notion/index.tspackages/specialists/src/index.tsWhy
Production: Slack DM asking sage about Notion content returns "I could not complete that request right now." The agent card lists only github/linear/pr_investigation skills. Cloud has no Notion specialist to wire because the upstream librarian/adapter doesn't exist.
Follow-up (separate PR)
After this lands and
@agent-assistant/specialistsrepublishes, the cloud-side workflowfix-notion-specialist-apifallback.tsbuilds notion-api-client, notion-api-fallback, /api/v1/notion/query proxy, and the notion-specialist-agentic wiring.Validation
🤖 Generated with Claude Code