Fix production retrieval correctness#146
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR introduces source-domain classification and intent-aware retrieval filtering. It adds a shared utility for classifying file paths into domains and detecting hard-ignored paths, extends contracts with domain metadata and diagnostic kinds, refactors discovery to use shared helpers, implements prompt exclusion detection, enriches retrieval with symbol-reference parsing and domain-aware filtering, and expands diagnostics to flag pollution and incorrect pack shapes. ChangesSource-domain filtering and discovery
Test coverage
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
src/runtime/context-pack-diagnostics.ts (2)
340-355: 💤 Low value
missingRuntimePipelinemay produce false positives.The function checks for both
pipelineNodeCount === 0andstructuralRelations === 0. However, it already callspipelinePrompt(pack)at line 354, which means it's checking a pipeline prompt. If the pack has pipeline nodes but no structural relations (or vice versa), it will still trigger. Consider whether this OR condition is intentional or if it should be AND.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/context-pack-diagnostics.ts` around lines 340 - 355, The missingRuntimePipeline function currently uses an OR between pipelineNodeCount === 0 and structuralRelations === 0 which causes false positives when only one of those is missing; update the return to require both to be missing by changing the condition to pipelinePrompt(pack) && (pipelineNodeCount === 0 && structuralRelations === 0) so that missingRuntimePipeline only reports when there are no pipeline nodes AND no structural relations; ensure you modify the expression that references pipelineNodeCount and structuralRelations in missingRuntimePipeline accordingly.
371-382: 💤 Low valueFixed denominator may allow negative scores before clamping.
The fixed denominator of 10 means that if many new warnings trigger (total weight > 10), the raw score would go negative before the
Math.max(0, ...)clamp. This is handled correctly, but consider adding a comment explaining the design choice and the expected behavior when total triggered weight exceeds the denominator.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/context-pack-diagnostics.ts` around lines 371 - 382, The computeQualityScore function uses a fixed denominator totalWeight = 10 which allows the raw score to go negative when triggeredWeight (accumulated from RULE_WEIGHTS via the warnings loop) exceeds that denominator before the clamp; add a concise comment above totalWeight (or at the top of computeQualityScore) explaining that the denominator is intentionally fixed to preserve historical comparability, that excess triggeredWeight will produce negative raw scores which are then clamped to 0 by Math.max, and document the expected semantics (new warnings reduce the numerator but don’t change the historical baseline), referencing totalWeight, triggeredWeight, and RULE_WEIGHTS so future readers understand the design choice.src/runtime/retrieval-gate.ts (1)
75-76: 💤 Low valueRegex may miss some valid symbol patterns.
The
SYMBOL_EXPLICIT_REpattern requires the class name to start with uppercase ([A-Z]), which won't match lowercase class names or module-style references likeutils.parseDate. Consider whether this is intentional or if the pattern should be more permissive.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/retrieval-gate.ts` around lines 75 - 76, The SYMBOL_EXPLICIT_RE currently enforces an uppercase first character for the left-hand identifier which will miss valid symbols like lowercase class names or module.function references; update the regex used in SYMBOL_EXPLICIT_RE to allow an initial lowercase letter and to accept dot-separated module paths (e.g., allow [A-Za-z_$][A-Za-z0-9_$]* at the front and permit repeated (?:\.[A-Za-z_$][A-Za-z0-9_$]*)? segments), while preserving existing support for separators (., #, ::) and trailing parentheses; locate and update the SYMBOL_EXPLICIT_RE declaration to be more permissive (no change needed to STACK_TRACE_RE).src/runtime/context-pack.ts (1)
863-866: 💤 Low valueClarify the intent of domain boost for non-production sources.
The +0.25 boost for domains that are neither
productionnorunknownseems intentional—it rewards explicitly-tagged domains (test, benchmark, config, etc.) when they pass the penalty gate (i.e., when the task contract requests them). However, this logic is subtle and could benefit from a brief inline comment explaining the rationale.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/context-pack.ts` around lines 863 - 866, Add a brief inline comment above the block that adjusts score based on view.source_domain (the if checking view.source_domain !== 'production' && view.source_domain !== 'unknown') explaining the intent: that non-production explicit domains (e.g., test, benchmark, config) receive a +0.25 boost to reward explicitly-tagged non-prod sources when the task contract permits them (after any penalty gate), so future readers understand this subtle scoring choice; reference the variables score, reasons, and the pushUnique call in the comment for clarity.src/runtime/retrieve.ts (1)
741-781: 💤 Low valueSymbol matching score thresholds could benefit from documentation.
The
symbolReferenceMatchScorefunction returns scores in the range 0-4 with specific thresholds (4, 3.5, 3, 2.5) for different match qualities. Consider adding a brief comment or JSDoc explaining the score semantics, as these values are used as gates elsewhere (e.g.,symbolMatch >= 3at line 1673,symbolMatch >= 3at line 1982).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/retrieve.ts` around lines 741 - 781, The function symbolReferenceMatchScore uses numeric thresholds (0, 2.5, 3, 3.5, 4) to represent match quality but has no documentation; add a concise JSDoc above symbolReferenceMatchScore that documents the meaning of each score (e.g., 4 = exact match, 3.5 = method match with class in source, 3 = method match on reference with class context, 2.5 = bare name match, 0 = no match) and note where callers gate on values (e.g., consumers that compare symbolMatch >= 3 expect strong matches). Ensure the comment mentions the score range (0–4) and intended semantic thresholds so future readers and maintainers understand why callers check for >=3.src/runtime/retrieve/slicing.ts (1)
203-209: 💤 Low valueAnchor reason cascade has a potential logic gap.
The reason assignment at lines 203-209 uses a cascade:
exactLabelMatch→literalPathMatch→sourcePathMatch→ default. However,sourcePathMatchcould be true when both exact label AND source path match. The current order prioritizesexactLabelMatchwhich seems correct, but the intermediateliteralPathMatchcheck (line 205) could mask cases wheresourcePathMatchis more descriptive. Consider whether this is the intended precedence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/retrieve/slicing.ts` around lines 203 - 209, The cascade assigning reason can mis-prioritize source path matches because node.literalPathMatch is checked before node.sourcePathMatch; decide the intended precedence and update the ternary chain accordingly (e.g., exactLabelMatch → sourcePathMatch → literalPathMatch → default if source path should be more descriptive), or explicitly document the chosen precedence with a comment next to the reason assignment; update the expression that computes reason (referencing the variable reason and properties node.exactLabelMatch, node.literalPathMatch, node.sourcePathMatch) and add/adjust unit tests to cover combinations where multiple flags are true to prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/runtime/retrieve.ts`:
- Line 1563: The code sets rootPath using graph.graph.root_path with
process.cwd() as a silent fallback (const rootPath = typeof
graph.graph.root_path === 'string' ? graph.graph.root_path : process.cwd()),
which can cause surprising behavior in non-standard working directories; update
the logic to detect when graph.graph.root_path is missing and emit a clear
warning log (e.g., via the existing logger used in this module) that indicates
the fallback to process.cwd() and recommends configuring root_path, and apply
the same change to the other occurrence noted (the line around 2118) so both
places log the fallback rather than silently using process.cwd().
In `@src/shared/source-discovery.ts`:
- Around line 157-174: The isDiscoveryPathIgnored function currently lets a
negated pattern flip a hard-ignore; change the loop logic so hard ignores from
isHardIgnoredPath(relativePath) cannot be overridden by negated patterns:
compute ignored = isHardIgnoredPath(relativePath) as now, but inside the
for-loop (over patterns) if matchesWorkspacePattern(relativePath, path, pattern)
and the matched pattern is a negation (pattern.startsWith('!')), skip applying
the negation when ignored is true due to isHardIgnoredPath; only allow
non-negated patterns to set ignored = true and allow negated patterns to clear
ignored when and only when the current ignored state was set by a non-hard
ignore. Ensure references to isDiscoveryPathIgnored, isHardIgnoredPath,
matchesWorkspacePattern and the pattern/negated logic are updated accordingly.
---
Nitpick comments:
In `@src/runtime/context-pack-diagnostics.ts`:
- Around line 340-355: The missingRuntimePipeline function currently uses an OR
between pipelineNodeCount === 0 and structuralRelations === 0 which causes false
positives when only one of those is missing; update the return to require both
to be missing by changing the condition to pipelinePrompt(pack) &&
(pipelineNodeCount === 0 && structuralRelations === 0) so that
missingRuntimePipeline only reports when there are no pipeline nodes AND no
structural relations; ensure you modify the expression that references
pipelineNodeCount and structuralRelations in missingRuntimePipeline accordingly.
- Around line 371-382: The computeQualityScore function uses a fixed denominator
totalWeight = 10 which allows the raw score to go negative when triggeredWeight
(accumulated from RULE_WEIGHTS via the warnings loop) exceeds that denominator
before the clamp; add a concise comment above totalWeight (or at the top of
computeQualityScore) explaining that the denominator is intentionally fixed to
preserve historical comparability, that excess triggeredWeight will produce
negative raw scores which are then clamped to 0 by Math.max, and document the
expected semantics (new warnings reduce the numerator but don’t change the
historical baseline), referencing totalWeight, triggeredWeight, and RULE_WEIGHTS
so future readers understand the design choice.
In `@src/runtime/context-pack.ts`:
- Around line 863-866: Add a brief inline comment above the block that adjusts
score based on view.source_domain (the if checking view.source_domain !==
'production' && view.source_domain !== 'unknown') explaining the intent: that
non-production explicit domains (e.g., test, benchmark, config) receive a +0.25
boost to reward explicitly-tagged non-prod sources when the task contract
permits them (after any penalty gate), so future readers understand this subtle
scoring choice; reference the variables score, reasons, and the pushUnique call
in the comment for clarity.
In `@src/runtime/retrieval-gate.ts`:
- Around line 75-76: The SYMBOL_EXPLICIT_RE currently enforces an uppercase
first character for the left-hand identifier which will miss valid symbols like
lowercase class names or module.function references; update the regex used in
SYMBOL_EXPLICIT_RE to allow an initial lowercase letter and to accept
dot-separated module paths (e.g., allow [A-Za-z_$][A-Za-z0-9_$]* at the front
and permit repeated (?:\.[A-Za-z_$][A-Za-z0-9_$]*)? segments), while preserving
existing support for separators (., #, ::) and trailing parentheses; locate and
update the SYMBOL_EXPLICIT_RE declaration to be more permissive (no change
needed to STACK_TRACE_RE).
In `@src/runtime/retrieve.ts`:
- Around line 741-781: The function symbolReferenceMatchScore uses numeric
thresholds (0, 2.5, 3, 3.5, 4) to represent match quality but has no
documentation; add a concise JSDoc above symbolReferenceMatchScore that
documents the meaning of each score (e.g., 4 = exact match, 3.5 = method match
with class in source, 3 = method match on reference with class context, 2.5 =
bare name match, 0 = no match) and note where callers gate on values (e.g.,
consumers that compare symbolMatch >= 3 expect strong matches). Ensure the
comment mentions the score range (0–4) and intended semantic thresholds so
future readers and maintainers understand why callers check for >=3.
In `@src/runtime/retrieve/slicing.ts`:
- Around line 203-209: The cascade assigning reason can mis-prioritize source
path matches because node.literalPathMatch is checked before
node.sourcePathMatch; decide the intended precedence and update the ternary
chain accordingly (e.g., exactLabelMatch → sourcePathMatch → literalPathMatch →
default if source path should be more descriptive), or explicitly document the
chosen precedence with a comment next to the reason assignment; update the
expression that computes reason (referencing the variable reason and properties
node.exactLabelMatch, node.literalPathMatch, node.sourcePathMatch) and
add/adjust unit tests to cover combinations where multiple flags are true to
prevent regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6683a487-0d60-4861-9a0f-ffbf42ec8331
📒 Files selected for processing (20)
CHANGELOG.mdREADME.mdsrc/contracts/context-pack-diagnostics.tssrc/contracts/context-pack.tssrc/contracts/retrieval-gate.tssrc/pipeline/detect.tssrc/pipeline/spi/build.tssrc/runtime/context-pack-diagnostics.tssrc/runtime/context-pack.tssrc/runtime/retrieval-gate.tssrc/runtime/retrieve.tssrc/runtime/retrieve/slicing.tssrc/shared/source-discovery.tstests/unit/benchmark-quality.test.tstests/unit/context-pack-diagnostics.test.tstests/unit/detect.test.tstests/unit/generate.test.tstests/unit/retrieval-gate.test.tstests/unit/retrieve-production-correctness.test.tstests/unit/spi-build.test.ts
| const { question, budget } = options | ||
| const questionTokens = tokenizeQuestion(question) | ||
| const rootPath = typeof graph.graph.root_path === 'string' ? graph.graph.root_path : undefined | ||
| const rootPath = typeof graph.graph.root_path === 'string' ? graph.graph.root_path : process.cwd() |
There was a problem hiding this comment.
process.cwd() fallback for rootPath may behave unexpectedly in some environments.
Using process.cwd() as a fallback when graph.graph.root_path is unset could lead to inconsistent behavior if the process working directory differs from the repository root (e.g., when run from a subdirectory or in CI environments with non-standard working directories). Consider documenting this behavior or adding a warning log when the fallback is used.
Also applies to: 2118-2118
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/runtime/retrieve.ts` at line 1563, The code sets rootPath using
graph.graph.root_path with process.cwd() as a silent fallback (const rootPath =
typeof graph.graph.root_path === 'string' ? graph.graph.root_path :
process.cwd()), which can cause surprising behavior in non-standard working
directories; update the logic to detect when graph.graph.root_path is missing
and emit a clear warning log (e.g., via the existing logger used in this module)
that indicates the fallback to process.cwd() and recommends configuring
root_path, and apply the same change to the other occurrence noted (the line
around 2118) so both places log the fallback rather than silently using
process.cwd().
| function globToRegExp(pattern: string): RegExp { | ||
| const wildcardCount = [...pattern].filter((character) => character === '*').length | ||
| if (pattern.length > 512 || wildcardCount > 32) { | ||
| return /^$/ | ||
| } | ||
|
|
||
| const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&') | ||
| const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.') | ||
| return new RegExp(`^${wildcarded}$`) |
There was a problem hiding this comment.
Preserve glob segment semantics in .graphifyignore matching.
globToRegExp() currently maps both * and ** to .*. That means patterns like src/*.ts also match src/nested/file.ts, while **/*.ts misses index.ts at the workspace root. The new ignore support will silently include/exclude the wrong files for common gitignore-style patterns.
Possible fix direction
function globToRegExp(pattern: string): RegExp {
const wildcardCount = [...pattern].filter((character) => character === '*').length
if (pattern.length > 512 || wildcardCount > 32) {
return /^$/
}
const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&')
- const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.')
+ const wildcarded = escaped
+ .replace(/\*\*\//g, '(?:.*/)?')
+ .replace(/\*\*/g, '.*')
+ .replace(/\*/g, '[^/]*')
+ .replace(/\?/g, '[^/]')
return new RegExp(`^${wildcarded}$`)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function globToRegExp(pattern: string): RegExp { | |
| const wildcardCount = [...pattern].filter((character) => character === '*').length | |
| if (pattern.length > 512 || wildcardCount > 32) { | |
| return /^$/ | |
| } | |
| const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&') | |
| const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.') | |
| return new RegExp(`^${wildcarded}$`) | |
| function globToRegExp(pattern: string): RegExp { | |
| const wildcardCount = [...pattern].filter((character) => character === '*').length | |
| if (pattern.length > 512 || wildcardCount > 32) { | |
| return /^$/ | |
| } | |
| const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&') | |
| const wildcarded = escaped | |
| .replace(/\*\*\//g, '(?:.*/)?') | |
| .replace(/\*\*/g, '.*') | |
| .replace(/\*/g, '[^/]*') | |
| .replace(/\?/g, '[^/]') | |
| return new RegExp(`^${wildcarded}$`) | |
| } |
🧰 Tools
🪛 ast-grep (0.42.2)
[warning] 98-98: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${wildcarded}$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
| export function isDiscoveryPathIgnored(path: string, root: string, patterns: readonly string[]): boolean { | ||
| const relativePath = relativeWorkspacePath(path, root) | ||
| if (!relativePath) { | ||
| return false | ||
| } | ||
|
|
||
| let ignored = isHardIgnoredPath(relativePath) | ||
| for (const rawPattern of patterns) { | ||
| const pattern = rawPattern.trim() | ||
| if (!pattern) { | ||
| continue | ||
| } | ||
| const negated = pattern.startsWith('!') | ||
| if (matchesWorkspacePattern(relativePath, path, pattern)) { | ||
| ignored = !negated | ||
| } | ||
| } | ||
| return ignored |
There was a problem hiding this comment.
Keep hard ignores non-overridable.
A negated .graphifyignore entry can currently flip ignored back to false for graphify-out, node_modules, and other hard-ignored paths. That defeats the new hard-ignore contract and reopens generated artifacts to indexing.
Suggested change
export function isDiscoveryPathIgnored(path: string, root: string, patterns: readonly string[]): boolean {
const relativePath = relativeWorkspacePath(path, root)
if (!relativePath) {
return false
}
- let ignored = isHardIgnoredPath(relativePath)
+ if (isHardIgnoredPath(relativePath)) {
+ return true
+ }
+
+ let ignored = false
for (const rawPattern of patterns) {
const pattern = rawPattern.trim()
if (!pattern) {
continue
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/shared/source-discovery.ts` around lines 157 - 174, The
isDiscoveryPathIgnored function currently lets a negated pattern flip a
hard-ignore; change the loop logic so hard ignores from
isHardIgnoredPath(relativePath) cannot be overridden by negated patterns:
compute ignored = isHardIgnoredPath(relativePath) as now, but inside the
for-loop (over patterns) if matchesWorkspacePattern(relativePath, path, pattern)
and the matched pattern is a negation (pattern.startsWith('!')), skip applying
the negation when ignored is true due to isHardIgnoredPath; only allow
non-negated patterns to set ignored = true and allow negated patterns to clear
ignored when and only when the current ignored state was set by a non-hard
ignore. Ensure references to isDiscoveryPathIgnored, isHardIgnoredPath,
matchesWorkspacePattern and the pattern/negated logic are updated accordingly.
Summary
Verification
Summary by CodeRabbit
.graphifyignorepatterns now support negation for fine-grained control