Skip to content

Fix production retrieval correctness#146

Open
mohanagy wants to merge 1 commit into
mainfrom
retrieval-correctness
Open

Fix production retrieval correctness#146
mohanagy wants to merge 1 commit into
mainfrom
retrieval-correctness

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 11, 2026

Summary

  • share default hard-ignore rules across legacy detect and SPI and add source-domain classification
  • make retrieval exclusions operational, fix negated test intent parsing, and prefer exact method anchors with truthful slice reasons
  • deepen slice-v1 runtime traversal, add bad-pack diagnostics, and lock behavior with production-correctness regressions

Verification

  • npm run typecheck
  • npm run build
  • npm run test:run

Summary by CodeRabbit

  • Changed
    • Improved file indexing: VCS/build/cache directories are now excluded by default, while tests and fixtures remain available for context
    • Smarter retrieval: Understands exclusion phrases like "exclude tests" and filters results accordingly
    • Better code anchoring for method-level prompts and framework patterns
    • Enhanced diagnostics with better visibility into context quality and domain coverage
    • .graphifyignore patterns now support negation for fine-grained control

Review Change Stack

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR introduces source-domain classification and intent-aware retrieval filtering. It adds a shared utility for classifying file paths into domains and detecting hard-ignored paths, extends contracts with domain metadata and diagnostic kinds, refactors discovery to use shared helpers, implements prompt exclusion detection, enriches retrieval with symbol-reference parsing and domain-aware filtering, and expands diagnostics to flag pollution and incorrect pack shapes.

Changes

Source-domain filtering and discovery

Layer / File(s) Summary
Source Discovery Foundation
src/shared/source-discovery.ts
New module classifying paths into SourceDomain (test/benchmark/fixture/generated/docs/config/build_artifact/production/unknown) and detecting hard-ignored VCS/cache/build directories with .graphifyignore pattern support.
Contract Updates
src/contracts/context-pack.ts, src/contracts/retrieval-gate.ts, src/contracts/context-pack-diagnostics.ts
ContextPackNode adds optional source_domain; new RetrievalExcludedDomain type; ContextPackDiagnosticKind extends with domain/pollution/structural diagnostics; ContextPackQualitySignals adds domain_distribution, excluded_domains, polluted_source_path_count.
Discovery Pipeline Refactoring
src/pipeline/detect.ts, src/pipeline/spi/build.ts
Delegates ignore-pattern loading and path filtering to shared helpers; removes local glob/regex matching and hardcoded SKIP_DIRS; stops scanning graphify-out/memory.
Retrieval Gate Exclusions
src/runtime/retrieval-gate.ts
Implements extractPromptExclusions() to detect and map exclusion terms to domains/path hints; enriches classifyRetrievalLevel to emit structured excluded_domains, excluded_terms, excluded_path_hints signals; enhances detectSymbols to extract non-backtick references.
Symbol Reference Parsing
src/runtime/retrieve.ts (symbol parsing section)
Introduces SymbolReference parsing, symbolReferenceMatchScore() matching, and domain-exclusion helpers to drive seed filtering with symbol-aware anchoring.
Retrieval Seed & Expansion Scoring
src/runtime/retrieve.ts (scoring sections)
Extends seed/expansion structures with sourceDomain, exactLabelMatch, literalPathMatch; wires domain-based penalties and symbol-aware scoring through seed enumeration, expansion, unscored-node addition, and async retrieval using classifySourceDomain and exclusion signals.
Retrieval Slicing Policy
src/runtime/retrieve/slicing.ts
Adds effectivePolicy() adjusting traversal based on method anchors and runtime prompts; updates shouldSuppressNode() to filter by domain/pollution/exclusions; threads SliceOptions through traversal; updates sliceCandidatesForRetrieve() signature to accept options.
Context Pack Candidate Scoring
src/runtime/context-pack.ts
Classifies source_domain for each candidate; introduces sourceDomainPenalty() helper; adds domain-based score boost and penalty during value computation; replaces test-detection regex with domain classification.
Context Pack Diagnostics
src/runtime/context-pack-diagnostics.ts
Extends computeContextPackDiagnostics with domain/pollution rule weights; computes domain_distribution and polluted_source_path_count in signals; emits warnings for excluded-domain selection, polluted paths, test-dominated packs, controller-only pipelines, missing anchors/pipelines, and missing structural evidence.
Documentation
CHANGELOG.md, README.md
Documents stricter default discovery, soft-penalty behavior for tests/benchmarks, intent-aware exclusions, improved anchoring for Class.method prompts, and enhanced diagnostics.

Test coverage

Layer / File(s) Summary
Discovery Tests
tests/unit/detect.test.ts, tests/unit/spi-build.test.ts
Validates hard-ignore of worktrees/graphify-out/build outputs while indexing tests/benchmarks/fixtures; confirms test config files (*.test.ts, jest.config.js, etc.) are now included as code.
Retrieval Gate Tests
tests/unit/retrieval-gate.test.ts
Adds tests for explicit Class.method symbol detection; validates exclusion detection mapping test/benchmark/fixture/reporter domains; confirms excluded terms suppress test/benchmark intent.
Context Pack Diagnostics Tests
tests/unit/context-pack-diagnostics.test.ts
Validates excluded-domain-selected, polluted-source-path-selected, controller-only pipeline pack, missing method anchor, missing runtime pipeline, test-dominated pack, and missing structural evidence warnings.
Production Retrieval Correctness
tests/unit/retrieve-production-correctness.test.ts
New end-to-end test file validating domain/term suppression, Class.method anchor promotion, anchor-reason distinctions, and absence of incorrect diagnostic warnings for healthy production slices.
Benchmark & Generate Tests
tests/unit/benchmark-quality.test.ts, tests/unit/generate.test.ts
Benchmark uses isolated OS temp directories; generate.test.ts verifies saved memory notes under graphify-out/memory are not re-indexed.

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • mohanagy/graphify-ts#103: Extends retrieval-gate integration with new exclusion signals and deeper propagation through same retrieval plumbing.
  • mohanagy/graphify-ts#121: Both modify context-pack diagnostics and runtime areas that this PR builds upon for enriched signals and quality checks.
  • mohanagy/graphify-ts#144: Modifies same retrieval slicing and retrieve.ts plumbing for foundational slice recovery.

🐰 Domains dance through the pipeline's spine,
Exclusions whisper what to leave behind,
Test and prod paths now correctly aligned,
Context packs freed from pollution's bind!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly describes the main objective: fixing production retrieval correctness, which aligns with the primary focus on making retrieval exclusions operational, fixing intent parsing, and improving slice anchoring.
Description check ✅ Passed The PR description covers all required template sections with specific details: summary explains the three main changes, verification lists the testing commands performed, and checklist items are implicitly addressed through the comprehensive testing and no secrets disclosure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch retrieval-correctness

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
src/runtime/context-pack-diagnostics.ts (2)

340-355: 💤 Low value

missingRuntimePipeline may produce false positives.

The function checks for both pipelineNodeCount === 0 and structuralRelations === 0. However, it already calls pipelinePrompt(pack) at line 354, which means it's checking a pipeline prompt. If the pack has pipeline nodes but no structural relations (or vice versa), it will still trigger. Consider whether this OR condition is intentional or if it should be AND.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/context-pack-diagnostics.ts` around lines 340 - 355, The
missingRuntimePipeline function currently uses an OR between pipelineNodeCount
=== 0 and structuralRelations === 0 which causes false positives when only one
of those is missing; update the return to require both to be missing by changing
the condition to pipelinePrompt(pack) && (pipelineNodeCount === 0 &&
structuralRelations === 0) so that missingRuntimePipeline only reports when
there are no pipeline nodes AND no structural relations; ensure you modify the
expression that references pipelineNodeCount and structuralRelations in
missingRuntimePipeline accordingly.

371-382: 💤 Low value

Fixed denominator may allow negative scores before clamping.

The fixed denominator of 10 means that if many new warnings trigger (total weight > 10), the raw score would go negative before the Math.max(0, ...) clamp. This is handled correctly, but consider adding a comment explaining the design choice and the expected behavior when total triggered weight exceeds the denominator.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/context-pack-diagnostics.ts` around lines 371 - 382, The
computeQualityScore function uses a fixed denominator totalWeight = 10 which
allows the raw score to go negative when triggeredWeight (accumulated from
RULE_WEIGHTS via the warnings loop) exceeds that denominator before the clamp;
add a concise comment above totalWeight (or at the top of computeQualityScore)
explaining that the denominator is intentionally fixed to preserve historical
comparability, that excess triggeredWeight will produce negative raw scores
which are then clamped to 0 by Math.max, and document the expected semantics
(new warnings reduce the numerator but don’t change the historical baseline),
referencing totalWeight, triggeredWeight, and RULE_WEIGHTS so future readers
understand the design choice.
src/runtime/retrieval-gate.ts (1)

75-76: 💤 Low value

Regex may miss some valid symbol patterns.

The SYMBOL_EXPLICIT_RE pattern requires the class name to start with uppercase ([A-Z]), which won't match lowercase class names or module-style references like utils.parseDate. Consider whether this is intentional or if the pattern should be more permissive.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/retrieval-gate.ts` around lines 75 - 76, The SYMBOL_EXPLICIT_RE
currently enforces an uppercase first character for the left-hand identifier
which will miss valid symbols like lowercase class names or module.function
references; update the regex used in SYMBOL_EXPLICIT_RE to allow an initial
lowercase letter and to accept dot-separated module paths (e.g., allow
[A-Za-z_$][A-Za-z0-9_$]* at the front and permit repeated
(?:\.[A-Za-z_$][A-Za-z0-9_$]*)? segments), while preserving existing support for
separators (., #, ::) and trailing parentheses; locate and update the
SYMBOL_EXPLICIT_RE declaration to be more permissive (no change needed to
STACK_TRACE_RE).
src/runtime/context-pack.ts (1)

863-866: 💤 Low value

Clarify the intent of domain boost for non-production sources.

The +0.25 boost for domains that are neither production nor unknown seems intentional—it rewards explicitly-tagged domains (test, benchmark, config, etc.) when they pass the penalty gate (i.e., when the task contract requests them). However, this logic is subtle and could benefit from a brief inline comment explaining the rationale.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/context-pack.ts` around lines 863 - 866, Add a brief inline
comment above the block that adjusts score based on view.source_domain (the if
checking view.source_domain !== 'production' && view.source_domain !==
'unknown') explaining the intent: that non-production explicit domains (e.g.,
test, benchmark, config) receive a +0.25 boost to reward explicitly-tagged
non-prod sources when the task contract permits them (after any penalty gate),
so future readers understand this subtle scoring choice; reference the variables
score, reasons, and the pushUnique call in the comment for clarity.
src/runtime/retrieve.ts (1)

741-781: 💤 Low value

Symbol matching score thresholds could benefit from documentation.

The symbolReferenceMatchScore function returns scores in the range 0-4 with specific thresholds (4, 3.5, 3, 2.5) for different match qualities. Consider adding a brief comment or JSDoc explaining the score semantics, as these values are used as gates elsewhere (e.g., symbolMatch >= 3 at line 1673, symbolMatch >= 3 at line 1982).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/retrieve.ts` around lines 741 - 781, The function
symbolReferenceMatchScore uses numeric thresholds (0, 2.5, 3, 3.5, 4) to
represent match quality but has no documentation; add a concise JSDoc above
symbolReferenceMatchScore that documents the meaning of each score (e.g., 4 =
exact match, 3.5 = method match with class in source, 3 = method match on
reference with class context, 2.5 = bare name match, 0 = no match) and note
where callers gate on values (e.g., consumers that compare symbolMatch >= 3
expect strong matches). Ensure the comment mentions the score range (0–4) and
intended semantic thresholds so future readers and maintainers understand why
callers check for >=3.
src/runtime/retrieve/slicing.ts (1)

203-209: 💤 Low value

Anchor reason cascade has a potential logic gap.

The reason assignment at lines 203-209 uses a cascade: exactLabelMatchliteralPathMatchsourcePathMatch → default. However, sourcePathMatch could be true when both exact label AND source path match. The current order prioritizes exactLabelMatch which seems correct, but the intermediate literalPathMatch check (line 205) could mask cases where sourcePathMatch is more descriptive. Consider whether this is the intended precedence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/retrieve/slicing.ts` around lines 203 - 209, The cascade
assigning reason can mis-prioritize source path matches because
node.literalPathMatch is checked before node.sourcePathMatch; decide the
intended precedence and update the ternary chain accordingly (e.g.,
exactLabelMatch → sourcePathMatch → literalPathMatch → default if source path
should be more descriptive), or explicitly document the chosen precedence with a
comment next to the reason assignment; update the expression that computes
reason (referencing the variable reason and properties node.exactLabelMatch,
node.literalPathMatch, node.sourcePathMatch) and add/adjust unit tests to cover
combinations where multiple flags are true to prevent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/retrieve.ts`:
- Line 1563: The code sets rootPath using graph.graph.root_path with
process.cwd() as a silent fallback (const rootPath = typeof
graph.graph.root_path === 'string' ? graph.graph.root_path : process.cwd()),
which can cause surprising behavior in non-standard working directories; update
the logic to detect when graph.graph.root_path is missing and emit a clear
warning log (e.g., via the existing logger used in this module) that indicates
the fallback to process.cwd() and recommends configuring root_path, and apply
the same change to the other occurrence noted (the line around 2118) so both
places log the fallback rather than silently using process.cwd().

In `@src/shared/source-discovery.ts`:
- Around line 157-174: The isDiscoveryPathIgnored function currently lets a
negated pattern flip a hard-ignore; change the loop logic so hard ignores from
isHardIgnoredPath(relativePath) cannot be overridden by negated patterns:
compute ignored = isHardIgnoredPath(relativePath) as now, but inside the
for-loop (over patterns) if matchesWorkspacePattern(relativePath, path, pattern)
and the matched pattern is a negation (pattern.startsWith('!')), skip applying
the negation when ignored is true due to isHardIgnoredPath; only allow
non-negated patterns to set ignored = true and allow negated patterns to clear
ignored when and only when the current ignored state was set by a non-hard
ignore. Ensure references to isDiscoveryPathIgnored, isHardIgnoredPath,
matchesWorkspacePattern and the pattern/negated logic are updated accordingly.

---

Nitpick comments:
In `@src/runtime/context-pack-diagnostics.ts`:
- Around line 340-355: The missingRuntimePipeline function currently uses an OR
between pipelineNodeCount === 0 and structuralRelations === 0 which causes false
positives when only one of those is missing; update the return to require both
to be missing by changing the condition to pipelinePrompt(pack) &&
(pipelineNodeCount === 0 && structuralRelations === 0) so that
missingRuntimePipeline only reports when there are no pipeline nodes AND no
structural relations; ensure you modify the expression that references
pipelineNodeCount and structuralRelations in missingRuntimePipeline accordingly.
- Around line 371-382: The computeQualityScore function uses a fixed denominator
totalWeight = 10 which allows the raw score to go negative when triggeredWeight
(accumulated from RULE_WEIGHTS via the warnings loop) exceeds that denominator
before the clamp; add a concise comment above totalWeight (or at the top of
computeQualityScore) explaining that the denominator is intentionally fixed to
preserve historical comparability, that excess triggeredWeight will produce
negative raw scores which are then clamped to 0 by Math.max, and document the
expected semantics (new warnings reduce the numerator but don’t change the
historical baseline), referencing totalWeight, triggeredWeight, and RULE_WEIGHTS
so future readers understand the design choice.

In `@src/runtime/context-pack.ts`:
- Around line 863-866: Add a brief inline comment above the block that adjusts
score based on view.source_domain (the if checking view.source_domain !==
'production' && view.source_domain !== 'unknown') explaining the intent: that
non-production explicit domains (e.g., test, benchmark, config) receive a +0.25
boost to reward explicitly-tagged non-prod sources when the task contract
permits them (after any penalty gate), so future readers understand this subtle
scoring choice; reference the variables score, reasons, and the pushUnique call
in the comment for clarity.

In `@src/runtime/retrieval-gate.ts`:
- Around line 75-76: The SYMBOL_EXPLICIT_RE currently enforces an uppercase
first character for the left-hand identifier which will miss valid symbols like
lowercase class names or module.function references; update the regex used in
SYMBOL_EXPLICIT_RE to allow an initial lowercase letter and to accept
dot-separated module paths (e.g., allow [A-Za-z_$][A-Za-z0-9_$]* at the front
and permit repeated (?:\.[A-Za-z_$][A-Za-z0-9_$]*)? segments), while preserving
existing support for separators (., #, ::) and trailing parentheses; locate and
update the SYMBOL_EXPLICIT_RE declaration to be more permissive (no change
needed to STACK_TRACE_RE).

In `@src/runtime/retrieve.ts`:
- Around line 741-781: The function symbolReferenceMatchScore uses numeric
thresholds (0, 2.5, 3, 3.5, 4) to represent match quality but has no
documentation; add a concise JSDoc above symbolReferenceMatchScore that
documents the meaning of each score (e.g., 4 = exact match, 3.5 = method match
with class in source, 3 = method match on reference with class context, 2.5 =
bare name match, 0 = no match) and note where callers gate on values (e.g.,
consumers that compare symbolMatch >= 3 expect strong matches). Ensure the
comment mentions the score range (0–4) and intended semantic thresholds so
future readers and maintainers understand why callers check for >=3.

In `@src/runtime/retrieve/slicing.ts`:
- Around line 203-209: The cascade assigning reason can mis-prioritize source
path matches because node.literalPathMatch is checked before
node.sourcePathMatch; decide the intended precedence and update the ternary
chain accordingly (e.g., exactLabelMatch → sourcePathMatch → literalPathMatch →
default if source path should be more descriptive), or explicitly document the
chosen precedence with a comment next to the reason assignment; update the
expression that computes reason (referencing the variable reason and properties
node.exactLabelMatch, node.literalPathMatch, node.sourcePathMatch) and
add/adjust unit tests to cover combinations where multiple flags are true to
prevent regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6683a487-0d60-4861-9a0f-ffbf42ec8331

📥 Commits

Reviewing files that changed from the base of the PR and between 7d4e486 and 64b1ee2.

📒 Files selected for processing (20)
  • CHANGELOG.md
  • README.md
  • src/contracts/context-pack-diagnostics.ts
  • src/contracts/context-pack.ts
  • src/contracts/retrieval-gate.ts
  • src/pipeline/detect.ts
  • src/pipeline/spi/build.ts
  • src/runtime/context-pack-diagnostics.ts
  • src/runtime/context-pack.ts
  • src/runtime/retrieval-gate.ts
  • src/runtime/retrieve.ts
  • src/runtime/retrieve/slicing.ts
  • src/shared/source-discovery.ts
  • tests/unit/benchmark-quality.test.ts
  • tests/unit/context-pack-diagnostics.test.ts
  • tests/unit/detect.test.ts
  • tests/unit/generate.test.ts
  • tests/unit/retrieval-gate.test.ts
  • tests/unit/retrieve-production-correctness.test.ts
  • tests/unit/spi-build.test.ts

Comment thread src/runtime/retrieve.ts
const { question, budget } = options
const questionTokens = tokenizeQuestion(question)
const rootPath = typeof graph.graph.root_path === 'string' ? graph.graph.root_path : undefined
const rootPath = typeof graph.graph.root_path === 'string' ? graph.graph.root_path : process.cwd()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

process.cwd() fallback for rootPath may behave unexpectedly in some environments.

Using process.cwd() as a fallback when graph.graph.root_path is unset could lead to inconsistent behavior if the process working directory differs from the repository root (e.g., when run from a subdirectory or in CI environments with non-standard working directories). Consider documenting this behavior or adding a warning log when the fallback is used.

Also applies to: 2118-2118

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/retrieve.ts` at line 1563, The code sets rootPath using
graph.graph.root_path with process.cwd() as a silent fallback (const rootPath =
typeof graph.graph.root_path === 'string' ? graph.graph.root_path :
process.cwd()), which can cause surprising behavior in non-standard working
directories; update the logic to detect when graph.graph.root_path is missing
and emit a clear warning log (e.g., via the existing logger used in this module)
that indicates the fallback to process.cwd() and recommends configuring
root_path, and apply the same change to the other occurrence noted (the line
around 2118) so both places log the fallback rather than silently using
process.cwd().

Comment on lines +91 to +99
function globToRegExp(pattern: string): RegExp {
const wildcardCount = [...pattern].filter((character) => character === '*').length
if (pattern.length > 512 || wildcardCount > 32) {
return /^$/
}

const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&')
const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.')
return new RegExp(`^${wildcarded}$`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve glob segment semantics in .graphifyignore matching.

globToRegExp() currently maps both * and ** to .*. That means patterns like src/*.ts also match src/nested/file.ts, while **/*.ts misses index.ts at the workspace root. The new ignore support will silently include/exclude the wrong files for common gitignore-style patterns.

Possible fix direction
 function globToRegExp(pattern: string): RegExp {
   const wildcardCount = [...pattern].filter((character) => character === '*').length
   if (pattern.length > 512 || wildcardCount > 32) {
     return /^$/
   }

   const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&')
-  const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.')
+  const wildcarded = escaped
+    .replace(/\*\*\//g, '(?:.*/)?')
+    .replace(/\*\*/g, '.*')
+    .replace(/\*/g, '[^/]*')
+    .replace(/\?/g, '[^/]')
   return new RegExp(`^${wildcarded}$`)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function globToRegExp(pattern: string): RegExp {
const wildcardCount = [...pattern].filter((character) => character === '*').length
if (pattern.length > 512 || wildcardCount > 32) {
return /^$/
}
const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&')
const wildcarded = escaped.replace(/\*/g, '.*').replace(/\?/g, '.')
return new RegExp(`^${wildcarded}$`)
function globToRegExp(pattern: string): RegExp {
const wildcardCount = [...pattern].filter((character) => character === '*').length
if (pattern.length > 512 || wildcardCount > 32) {
return /^$/
}
const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&')
const wildcarded = escaped
.replace(/\*\*\//g, '(?:.*/)?')
.replace(/\*\*/g, '.*')
.replace(/\*/g, '[^/]*')
.replace(/\?/g, '[^/]')
return new RegExp(`^${wildcarded}$`)
}
🧰 Tools
🪛 ast-grep (0.42.2)

[warning] 98-98: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${wildcarded}$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

Comment on lines +157 to +174
export function isDiscoveryPathIgnored(path: string, root: string, patterns: readonly string[]): boolean {
const relativePath = relativeWorkspacePath(path, root)
if (!relativePath) {
return false
}

let ignored = isHardIgnoredPath(relativePath)
for (const rawPattern of patterns) {
const pattern = rawPattern.trim()
if (!pattern) {
continue
}
const negated = pattern.startsWith('!')
if (matchesWorkspacePattern(relativePath, path, pattern)) {
ignored = !negated
}
}
return ignored
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep hard ignores non-overridable.

A negated .graphifyignore entry can currently flip ignored back to false for graphify-out, node_modules, and other hard-ignored paths. That defeats the new hard-ignore contract and reopens generated artifacts to indexing.

Suggested change
 export function isDiscoveryPathIgnored(path: string, root: string, patterns: readonly string[]): boolean {
   const relativePath = relativeWorkspacePath(path, root)
   if (!relativePath) {
     return false
   }

-  let ignored = isHardIgnoredPath(relativePath)
+  if (isHardIgnoredPath(relativePath)) {
+    return true
+  }
+
+  let ignored = false
   for (const rawPattern of patterns) {
     const pattern = rawPattern.trim()
     if (!pattern) {
       continue
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/source-discovery.ts` around lines 157 - 174, The
isDiscoveryPathIgnored function currently lets a negated pattern flip a
hard-ignore; change the loop logic so hard ignores from
isHardIgnoredPath(relativePath) cannot be overridden by negated patterns:
compute ignored = isHardIgnoredPath(relativePath) as now, but inside the
for-loop (over patterns) if matchesWorkspacePattern(relativePath, path, pattern)
and the matched pattern is a negation (pattern.startsWith('!')), skip applying
the negation when ignored is true due to isHardIgnoredPath; only allow
non-negated patterns to set ignored = true and allow negated patterns to clear
ignored when and only when the current ignored state was set by a non-hard
ignore. Ensure references to isDiscoveryPathIgnored, isHardIgnoredPath,
matchesWorkspacePattern and the pattern/negated logic are updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant