Skip to content

fix(scanner): reduce false positives in docstrings and inline code#26

Merged
linisme merged 2 commits intomainfrom
fix/scanner-false-positives
Apr 19, 2026
Merged

fix(scanner): reduce false positives in docstrings and inline code#26
linisme merged 2 commits intomainfrom
fix/scanner-false-positives

Conversation

@linisme
Copy link
Copy Markdown
Contributor

@linisme linisme commented Apr 19, 2026

Summary

Cut two well-understood scanner false-positive classes without weakening BLOCK-level detection.

  • Python docstring mask (normalize::python_docstring_mask): script_analyzer now skips WARN/DANGER findings whose original line sits fully inside a """...""" / '''...''' block in .py files and python-shebang scripts. BLOCK rules still fire. Removes common SC-003 / SC-007 noise from module docstrings describing tool behaviour (e.g. "behaves like rm -rf", "written to <workdir>/out.json").
  • Markdown inline-code stripping (strip_inline_code): WARN-level MD rules now match against a variant of each line with backtick spans blanked to spaces; DANGER/BLOCK still match original text so backtick wrapping is not an evasion. Removes MD-004 firing on prose that references command names inline.

Scope is intentionally narrow: shell heredocs remain scanned (their body is executable), SC-002 hits like subprocess.run / re.compile still fire (they are real API calls), gate still mediates user decision.

Impact

Measured against the public skillx-run/mac-space-cleanup skill: 18 → 14 findings, exactly the four docstring / inline-code false positives eliminated; the remaining 14 DANGER hits are real API usages left for the gate.

CLAUDE.md updated with the two new conventions.

Test plan

  • cargo test --workspace (266 + 124 + … all green)
  • Scanner module tests: 82 passed (11 new)
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --check
  • Manual scan of mac-space-cleanup before/after (18 → 14 findings, right four removed)
  • Reviewer sanity-check the docstring mask heuristic; known limitation is that it does not implement Python # comment semantics, so """ appearing inside a # comment can still flip the state machine. Tracked as follow-up — happy to add a #-stop in a follow-up commit if we want to close that gap here.

🤖 Generated with Claude Code

lin added 2 commits April 19, 2026 15:23
Add `normalize::python_docstring_mask()` that returns, for each original
line of a source file, whether that line sits fully inside a `"""..."""`
or `'''...'''` block. `script_analyzer` consults it on `.py` files and
files with a python shebang, and suppresses WARN/DANGER findings whose
line falls inside a docstring. BLOCK-level rules continue to fire in
docstrings.

Eliminates false positives like SC-003 / SC-007 triggering on mentions
of `rm -rf` or `>/path` inside module docstrings that describe the
script's behaviour. Shell heredocs are intentionally not masked --
their body is real executable code.
For WARN-level MD rules, match against a variant of each line where
backtick-delimited inline code spans have been blanked to spaces.
This eliminates the common false positive where SKILL.md prose
references a dangerous command name in backticks (e.g. "behaves
differently than `rm -rf` would") and trips MD-004.

DANGER/BLOCK rules still match the original text, so wrapping a
prompt-injection phrase in backticks cannot be used as an evasion.

Also document both scanner false-positive filters (Python docstrings
and markdown inline code) in CLAUDE.md conventions.
@linisme linisme merged commit e14ed73 into main Apr 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant