fix(scanner): reduce false positives in docstrings and inline code by linisme · Pull Request #26 · skillx-run/skillx

linisme · 2026-04-19T11:01:17Z

Summary

Cut two well-understood scanner false-positive classes without weakening BLOCK-level detection.

Python docstring mask (normalize::python_docstring_mask): script_analyzer now skips WARN/DANGER findings whose original line sits fully inside a """...""" / '''...''' block in .py files and python-shebang scripts. BLOCK rules still fire. Removes common SC-003 / SC-007 noise from module docstrings describing tool behaviour (e.g. "behaves like rm -rf", "written to <workdir>/out.json").
Markdown inline-code stripping (strip_inline_code): WARN-level MD rules now match against a variant of each line with backtick spans blanked to spaces; DANGER/BLOCK still match original text so backtick wrapping is not an evasion. Removes MD-004 firing on prose that references command names inline.

Scope is intentionally narrow: shell heredocs remain scanned (their body is executable), SC-002 hits like subprocess.run / re.compile still fire (they are real API calls), gate still mediates user decision.

Impact

Measured against the public skillx-run/mac-space-cleanup skill: 18 → 14 findings, exactly the four docstring / inline-code false positives eliminated; the remaining 14 DANGER hits are real API usages left for the gate.

CLAUDE.md updated with the two new conventions.

Test plan

cargo test --workspace (266 + 124 + … all green)
Scanner module tests: 82 passed (11 new)
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --check
Manual scan of mac-space-cleanup before/after (18 → 14 findings, right four removed)
Reviewer sanity-check the docstring mask heuristic; known limitation is that it does not implement Python # comment semantics, so """ appearing inside a # comment can still flip the state machine. Tracked as follow-up — happy to add a #-stop in a follow-up commit if we want to close that gap here.

🤖 Generated with Claude Code

Add `normalize::python_docstring_mask()` that returns, for each original line of a source file, whether that line sits fully inside a `"""..."""` or `'''...'''` block. `script_analyzer` consults it on `.py` files and files with a python shebang, and suppresses WARN/DANGER findings whose line falls inside a docstring. BLOCK-level rules continue to fire in docstrings. Eliminates false positives like SC-003 / SC-007 triggering on mentions of `rm -rf` or `>/path` inside module docstrings that describe the script's behaviour. Shell heredocs are intentionally not masked -- their body is real executable code.

For WARN-level MD rules, match against a variant of each line where backtick-delimited inline code spans have been blanked to spaces. This eliminates the common false positive where SKILL.md prose references a dangerous command name in backticks (e.g. "behaves differently than `rm -rf` would") and trips MD-004. DANGER/BLOCK rules still match the original text, so wrapping a prompt-injection phrase in backticks cannot be used as an evasion. Also document both scanner false-positive filters (Python docstrings and markdown inline code) in CLAUDE.md conventions.

lin added 2 commits April 19, 2026 15:23

linisme merged commit e14ed73 into main Apr 19, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scanner): reduce false positives in docstrings and inline code#26

fix(scanner): reduce false positives in docstrings and inline code#26
linisme merged 2 commits intomainfrom
fix/scanner-false-positives

linisme commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

linisme commented Apr 19, 2026

Summary

Impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant