feat(graph): enrich context graph with symbol nodes and semantic edges by dubscode · Pull Request #9 · wiseiodev/dubsbot

dubscode · 2026-03-04T05:27:21Z

Summary

This PR adds symbol-level context graph enrichment and semantic relationship edges, then exposes those relationships to retrieval surfaces without breaking existing file-level behavior.

What changed

Added graph enrichment model/types for symbol nodes and normalized edge enums:
- defines
- references
- imports
- calls
Added deterministic canonical symbol ID generation:
- <repo>::<path>::<kind>::<name>::<range-hash>
Added feature/config gate for rollout:
- DUBSBOT_ENABLE_SYMBOL_ENRICHMENT=1
Implemented TS/JS symbol extraction with best-effort diagnostics and partial-failure resilience.
Implemented graph persistence and cleanup for enriched file/symbol nodes and semantic edges.
Integrated enrichment into indexing flow via shared index helpers.
Extended hybrid retrieval to traverse graph hints and include graph_node citations while preserving file/chunk retrieval contract.
Added migration for graph enrichment constraints and indexes:
- src/db/migrations/0003_context_graph_enrichment.sql
Archived OpenSpec change deeper-context-graph-enrichment and synced the capability spec:
- openspec/specs/context-graph-enrichment/spec.md

Verification

pnpm checks
- pnpm test
- pnpm typecheck
- pnpm lint
- pnpm build
Added dedicated test coverage in tests/graph-enrichment.test.ts for:
- canonical ID stability and extraction determinism
- required edge presence + directionality
- retrieval compatibility and graph-node citation exposure
- enrichment runtime/volume acceptance thresholds

Stack note

This PR is stacked on top of #8 (feat/incremental-indexing-pipeline).

🥞 DubStack

feat(indexer): implement true incremental indexing pipeline #8 feat(indexer): implement true incremental indexing pipeline
feat(graph): enrich context graph with symbol nodes and semantic edges #9 feat(graph): enrich context graph with symbol nodes and semantic edges 👈
feat(mcp): add policy-gated MCP bridge with audit pipeline #10 feat(mcp): add policy-gated MCP bridge with audit pipeline

Add symbol extraction and canonical symbol IDs for TS/JS files, persist symbol/file graph nodes with normalized edge types (defines, references, imports, calls), and gate rollout via symbol enrichment config. Integrate graph enrichment into indexing, extend retrieval graph-hint traversal with graph_node citations, and add migration + validation tests for stability, directionality, compatibility, and volume thresholds. Archive OpenSpec change deeper-context-graph-enrichment and sync the new context-graph-enrichment spec into main specs.

github-actions · 2026-03-04T05:27:54Z

PR Checks Summary

Status: passed
Command: pnpm checks
Workflow run: https://github.com/wiseiodev/dubsbot/actions/runs/22656436949
Commit: 1ed3734

Copilot

Pull request overview

Adds symbol-level enrichment to the existing context graph so indexing can persist symbol nodes + semantic edges and retrieval can leverage graph hints while keeping file-level retrieval behavior intact.

Changes:

Add graph symbol extraction + persistence during indexing (feature-flagged).
Extend hybrid retrieval to traverse graph hints, boost relevant file paths, and return graph-node citations.
Add DB migration constraints/indexes plus a new Vitest suite and accompanying OpenSpec documentation/archive updates.

Reviewed changes

Copilot reviewed 13 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/graph-enrichment.test.ts	New tests covering extraction stability, edge presence/directionality checks, retrieval behavior, and perf/volume thresholds.
src/db/migrations/0003_context_graph_enrichment.sql	Adds check constraints for node/edge enums and indexes for graph tables.
src/context/retrieval/hybrid.ts	Adds graph-hint traversal to retrieval, path boosting, and graph-node citations.
src/context/indexer/incremental.ts	Adds `symbolEnrichmentEnabled` option to incremental indexing API.
src/context/indexer/full-index.ts	Adds `symbolEnrichmentEnabled` option to full indexing API.
src/context/indexer/file-index.ts	Runs extraction/persistence + cleanup for graph enrichment during file upsert/delete.
src/context/graph/types.ts	Defines node/edge/symbol types and canonical symbol ID format.
src/context/graph/persist.ts	Implements persistence + deletion of graph enrichment per file (nodes + edges).
src/context/graph/extract.ts	Implements initial TS/JS regex-based symbol extraction and semantic edge generation.
src/context/graph/config.ts	Adds env-var gate for enabling symbol enrichment.
openspec/specs/context-graph-enrichment/spec.md	New canonical spec describing requirements for graph enrichment.
openspec/changes/deeper-context-graph-enrichment/tasks.md	Removes old change tasks doc (archived).
openspec/changes/archive/2026-03-04-deeper-context-graph-enrichment/tasks.md	Archived tasks checklist marked completed.
openspec/changes/archive/2026-03-04-deeper-context-graph-enrichment/specs/context-graph-enrichment/spec.md	Archived requirements excerpt.
openspec/changes/archive/2026-03-04-deeper-context-graph-enrichment/proposal.md	Archived proposal for enrichment work.
openspec/changes/archive/2026-03-04-deeper-context-graph-enrichment/design.md	Archived design doc (goals/decisions/risks).
openspec/changes/archive/2026-03-04-deeper-context-graph-enrichment/.openspec.yaml	Adds archived change metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T05:33:00Z

src/context/indexer/incremental.ts

  trigger?: IncrementalTrigger;
  embedProvider?: ProviderAdapter;
  embeddingModel?: string;
+  symbolEnrichmentEnabled?: boolean;


runIncrementalIndex now accepts symbolEnrichmentEnabled, but when it falls back to runFullIndex the flag is not forwarded. This makes behavior depend on whether incremental fallback occurs, which is surprising and can disable enrichment unintentionally. Pass symbolEnrichmentEnabled through to the runFullIndex call in the fallback path.

Copilot · 2026-03-04T05:33:01Z

src/context/indexer/file-index.ts

+        await persistGraphEnrichmentForFile({
+          db: input.db,
+          repoRoot: input.repoRoot,
+          path: relativePath,
+          extraction,
+        });
+        for (const diagnostic of extraction.diagnostics) {
+          console.info(`[indexer:graph] ${diagnostic}`);


When symbol enrichment is enabled, extraction/persistence is attempted for every indexed file and any diagnostics are logged. For non-JS/TS files this will always emit unsupported-language:* and still write a file node, which can create a lot of DB churn and noisy logs on mixed-language repos. Consider short-circuiting (e.g., check canExtractSymbols(relativePath) or extraction.symbols.length > 0) and/or downgrade/aggregate diagnostics logging.

Suggested change

await persistGraphEnrichmentForFile({

db: input.db,

repoRoot: input.repoRoot,

path: relativePath,

extraction,

});

for (const diagnostic of extraction.diagnostics) {

console.info(`[indexer:graph] ${diagnostic}`);

// Short-circuit for files with no extracted symbols to avoid unnecessary

// persistence and noisy diagnostics (e.g. unsupported-language:*).

if (!extraction.symbols || extraction.symbols.length === 0) {

return {

status: 'indexed',

fileStatus,

chunksInserted: chunks.length,

chunksDeleted,

};

}

await persistGraphEnrichmentForFile({

db: input.db,

repoRoot: input.repoRoot,

path: relativePath,

extraction,

});

const diagnostics = extraction.diagnostics ?? [];

for (const diagnostic of diagnostics) {

const message = String(diagnostic);

if (message.startsWith('unsupported-language:')) {

continue;

}

console.info(`[indexer:graph] ${message}`);

Copilot · 2026-03-04T05:33:01Z

openspec/specs/context-graph-enrichment/spec.md

+# context-graph-enrichment Specification
+
+## Purpose
+TBD - created by archiving change deeper-context-graph-enrichment. Update Purpose after archive.


The spec Purpose section is left as "TBD". Since this is now the canonical spec under openspec/specs/, it should state the actual purpose of context graph enrichment (at least a 1–2 sentence summary) rather than referencing an archived change.

Suggested change

TBD - created by archiving change deeper-context-graph-enrichment. Update Purpose after archive.

This specification defines how the context graph is enriched with symbol-level nodes and normalized semantic relationship edges derived from source code. The goal is to enable precise retrieval, impact analysis, and other graph-based queries while preserving compatibility with existing file-level graph consumers.

Copilot · 2026-03-04T05:33:01Z

tests/graph-enrichment.test.ts

+      expect(edge.sourceKey).toContain('::file');
+      expect(edge.targetKey).not.toBe(edge.sourceKey);


The directionality assertions are very broad: the test currently requires all edges to have sourceKey containing ::file. That doesn't actually validate directionality per edge type, and it also conflicts with the spec language that references/calls edges may originate from a symbol context. Consider tightening this to assert directionality per edge type (e.g., defines should be file -> symbol) rather than hard-coding file as the source for every edge.

Suggested change

expect(edge.sourceKey).toContain('::file');

expect(edge.targetKey).not.toBe(edge.sourceKey);

switch (edge.type) {

case 'defines':

case 'imports':

// File-level edges should originate from a file context.

expect(edge.sourceKey).toContain('::file');

expect(edge.targetKey).not.toBe(edge.sourceKey);

break;

case 'references':

case 'calls':

// Reference/call edges may originate from a symbol context.

expect(edge.sourceKey).toContain('::symbol');

expect(edge.targetKey).not.toBe(edge.sourceKey);

break;

default:

// At minimum, no edge should be self-referential.

expect(edge.targetKey).not.toBe(edge.sourceKey);

break;

}

Copilot · 2026-03-04T05:33:02Z

src/context/retrieval/hybrid.ts

+     WHERE n.payload->>'repoRoot' = $1
+       AND (
+         n.node_key = ANY($2::text[])
+         OR n.payload->>'name' = ANY($2::text[])
+         OR n.payload->>'path' = ANY($2::text[])
+       )


traverseGraphHints filters context_nodes by n.payload->>'repoRoot' and then by payload->>'name' / payload->>'path'. There are indexes on node_type and node_key, but none on these JSONB fields, so this query will devolve into a full scan as the graph grows (especially with symbol nodes). Consider adding a functional index (e.g., on (payload->>'repoRoot') and/or (payload->>'name'), (payload->>'path')) or promoting these fields to dedicated columns for efficient lookups.

Copilot · 2026-03-04T05:33:02Z

src/context/graph/persist.ts

+      'SELECT id FROM context_nodes WHERE node_key LIKE $1',
+      [`${prefix}%`]


persistGraphEnrichmentForFile scopes nodes using node_key LIKE $1 with a prefix derived from repoRoot/path. In SQL LIKE, % and _ in the prefix are wildcards, so repos/paths containing those characters can unintentionally match and delete other nodes/edges. Use a non-wildcard prefix match (e.g., LEFT(node_key, LENGTH($1)) = $1 with the raw prefix, or escape %/_ and add an explicit ESCAPE clause).

Suggested change

'SELECT id FROM context_nodes WHERE node_key LIKE $1',

[`${prefix}%`]

'SELECT id FROM context_nodes WHERE LEFT(node_key, LENGTH($1)) = $1',

[prefix]

Copilot · 2026-03-04T05:33:02Z

src/context/graph/persist.ts

+    'SELECT id FROM context_nodes WHERE node_key LIKE $1',
+    [`${prefix}%`]


deleteGraphEnrichmentForFile uses node_key LIKE $1 with a prefix built from repoRoot/path. Because _/% are wildcards in LIKE, this can delete graph enrichment for other files if the repo path contains those characters. Prefer a deterministic prefix match (or properly escape the prefix + specify an ESCAPE clause).

Suggested change

'SELECT id FROM context_nodes WHERE node_key LIKE $1',

[`${prefix}%`]

'SELECT id FROM context_nodes WHERE LEFT(node_key, LENGTH($1)) = $1',

[prefix]

Copilot · 2026-03-04T05:33:03Z

src/db/migrations/0003_context_graph_enrichment.sql

+CHECK (edge_type IN ('defines', 'references', 'imports', 'calls'));
+
+CREATE INDEX IF NOT EXISTS idx_context_nodes_node_type ON context_nodes(node_type);
+CREATE INDEX IF NOT EXISTS idx_context_nodes_node_key ON context_nodes(node_key);


context_nodes.node_key is already UNIQUE (created in 0001_init.sql), which implicitly creates an index. Creating an additional idx_context_nodes_node_key index is redundant and adds extra write/storage overhead. Consider removing this index (or replacing the UNIQUE constraint/index if you intended a different index shape).

Suggested change

CREATE INDEX IF NOT EXISTS idx_context_nodes_node_key ON context_nodes(node_key);

Copilot AI review requested due to automatic review settings March 4, 2026 05:27

dubscode mentioned this pull request Mar 4, 2026

feat(indexer): implement true incremental indexing pipeline #8

Open

Copilot started reviewing on behalf of dubscode March 4, 2026 05:27 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

dubscode mentioned this pull request Mar 5, 2026

feat(mcp): add policy-gated MCP bridge with audit pipeline #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph): enrich context graph with symbol nodes and semantic edges#9

feat(graph): enrich context graph with symbol nodes and semantic edges#9
dubscode wants to merge 1 commit intofeat/incremental-indexing-pipelinefrom
feat/context-graph-enrichment

dubscode commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        await persistGraphEnrichmentForFile({
-          db: input.db,
-          repoRoot: input.repoRoot,
-          path: relativePath,
-          extraction,
-        });
-        for (const diagnostic of extraction.diagnostics) {
-          console.info(`[indexer:graph] ${diagnostic}`);
+        // Short-circuit for files with no extracted symbols to avoid unnecessary
+        // persistence and noisy diagnostics (e.g. unsupported-language:*).
+        if (!extraction.symbols || extraction.symbols.length === 0) {
+          return {
+            status: 'indexed',
+            fileStatus,
+            chunksInserted: chunks.length,
+            chunksDeleted,
+          };
+        }
+        await persistGraphEnrichmentForFile({
+          db: input.db,
+          repoRoot: input.repoRoot,
+          path: relativePath,
+          extraction,
+        });
+        const diagnostics = extraction.diagnostics ?? [];
+        for (const diagnostic of diagnostics) {
+          const message = String(diagnostic);
+          if (message.startsWith('unsupported-language:')) {
+            continue;
+          }
+          console.info(`[indexer:graph] ${message}`);

	TBD - created by archiving change deeper-context-graph-enrichment. Update Purpose after archive.
	This specification defines how the context graph is enriched with symbol-level nodes and normalized semantic relationship edges derived from source code. The goal is to enable precise retrieval, impact analysis, and other graph-based queries while preserving compatibility with existing file-level graph consumers.

		expect(edge.sourceKey).toContain('::file');
		expect(edge.targetKey).not.toBe(edge.sourceKey);

-      expect(edge.sourceKey).toContain('::file');
-      expect(edge.targetKey).not.toBe(edge.sourceKey);
+      switch (edge.type) {
+        case 'defines':
+        case 'imports':
+          // File-level edges should originate from a file context.
+          expect(edge.sourceKey).toContain('::file');
+          expect(edge.targetKey).not.toBe(edge.sourceKey);
+          break;
+        case 'references':
+        case 'calls':
+          // Reference/call edges may originate from a symbol context.
+          expect(edge.sourceKey).toContain('::symbol');
+          expect(edge.targetKey).not.toBe(edge.sourceKey);
+          break;
+        default:
+          // At minimum, no edge should be self-referential.
+          expect(edge.targetKey).not.toBe(edge.sourceKey);
+          break;
+      }

		'SELECT id FROM context_nodes WHERE node_key LIKE $1',
		[`${prefix}%`]

Conversation

dubscode commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Verification

Stack note

🥞 DubStack

Uh oh!

github-actions bot commented Mar 4, 2026

PR Checks Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dubscode commented Mar 4, 2026 •

edited

Loading