Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ Status legend: **ACTIVE** (current spec/contract, AI agents should read first) /
| `docs/exit_codes.md` | ACTIVE | Stable CLI exit code policy (0 / 1 / 2 / 3 / 4) for CI integration, including `--strict-repair`, `severity: info` Advisor channel, and per-subcommand notes |
| `docs/json_schema.md` | ACTIVE | CLI JSON envelopes — verdict / compile at `schema_version="6"`, compile-repair at independent `schema_version="1"`, validate-plan at independent `schema_version="2"`. Includes compatibility policy and v2→v3 through v5→v6 diffs (plus validate-plan v1→v2) |
| `docs/cli_test_inventory.md` | ACTIVE | CLI test coverage inventory, runtime notes, and conservative reduction candidates |
| `docs/target_yaml_guide.md` | ACTIVE | `target.yaml` authoring guide. Practical companion to `design.md §4` / `cli_usage.md`. Centralises authoring hazards D1 (`--package-root` scope vs `tests/` visibility), D3 (template / user constraint duplication), D4 (config-only PR の vacuous PASS) — 2026-05-07 Session 4 dogfooding 由来 |
| `docs/target_yaml_guide.md` | ACTIVE | `target.yaml` authoring guide. Practical companion to `design.md §4` / `cli_usage.md`. Centralises authoring hazards D1 (`--package-root` scope vs `tests/` visibility), D3 (template / user constraint duplication), D4 (config-only PR の vacuous PASS), D6 (nested-function complexity 盲点) — 2026-05-07 Session 4 / 2026-05-28 real-PR dogfooding 由来 |
| `docs/target_authoring_surface.md` | ACTIVE | Authoring surface 設計契約 (Brief 8 / CSCI-41)。target.yaml は hand-written 必須でない / 生成経路 3 通り (recipe + sources / catalog 参照 / hand-written) / LLM 経路は Brief 8b 分離 / 全経路は verdict 前に declared intent として固定 / Authoring・Advisor・Provenance surface は evaluator 不可参照 / `candidate_code_used: false` 固定。§23.3.1 の実装側 catch-up |
| `docs/ssp_protocol.md` | ACTIVE | SSP v0.1 normative spec: SensorOutput / Finding / SSPDelta / SSPVerdict definitions, 5-element SAST + 3-element SCA fingerprint, Python profile AST normalization, delta computation, verdict precedence (`unknown > fail > pass`), JSON Schema artifact, Sensor Provenance Invariant (§23.1 mirror), determinism requirements, core isolation contract |
| `docs/ssp_usage_guide.md` | ACTIVE | SSP practical usage guide: quick start (Semgrep SAST / pip-audit SCA / fixture mode), output formats (JSON / human / SARIF), CI integration (GitHub Actions workflow / exit code routing / fixture-based CI), hand-built SensorOutput examples, delta computation overview, relationship to core Semantic CI |
Expand Down
3 changes: 2 additions & 1 deletion docs/cli_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ semantic-ci target-doctor [--target <yaml>] [--package-root <dir>]
[--format {human,json}] [--output <file>]
```

Audits a `target.yaml` for seven authoring hazards and renders them as
Audits a `target.yaml` for eight authoring hazards and renders them as
advisories. Advisor surface — advisory presence does not change the
verdict and does not change the exit code (`docs/exit_codes.md`).

Expand All @@ -747,6 +747,7 @@ verdict and does not change the exit code (`docs/exit_codes.md`).
| `ADVISORY-D1` | `test_surface_delta.*` constraint exists, but no test files (`test_*.py` / `*_test.py` / `tests/`) are visible under `--package-root`. |
| `ADVISORY-D3` | A user constraint duplicates a template-expanded constraint (same kind/target/operator/expected). |
| `ADVISORY-D4` | The target is lock-only and the candidate diff (`--baseline-rev` ↔ `--candidate-rev`) touches no Python files; the verdict would be a vacuous PASS. Skipped silently when neither rev is given and git is unavailable. |
| `ADVISORY-D6` | The target declares a `complexity_delta` constraint and the candidate diff (`--baseline-rev` ↔ `--candidate-rev`) grows the nested-function count in an in-scope Python file. The complexity extractor does not descend into nested defs, so a reported cyclomatic/cognitive decrease may be displacement, not simplification. Skipped silently when neither rev is given and git is unavailable. |
| `ADVISORY-I1` | `intent` is the empty string. Repair adapters and `validate-plan` produce better guidance when intent describes the change purpose; use `init --intent` or edit `target.yaml`. |
| `ADVISORY-P1` | `primary_kind: feature` has no positive addition constraint. |
| `ADVISORY-P2` | `primary_kind: bugfix` has no `test_surface_delta.new_cases` expectation. |
Expand Down
8 changes: 4 additions & 4 deletions docs/dogfooding_findings_tracker.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Status taxonomy:
| D3 | 2026-05-07 Session 4 dogfood | template / user constraint duplication | user constraint shadows template default with identical semantics, doubles evidence | **解決** | `docs/target_yaml_guide.md` Hazard 2 + `ADVISORY-D3` detector |
| D4 | 2026-05-07 Session 4 dogfood | vacuous PASS (out-of-scope diff) | config-only PR produces empty Python `CodeStateDelta`; lock-only template constraints pass without exercising the actual change | **解決** | `docs/target_yaml_guide.md` Hazard 3 + `ADVISORY-D4` detector |
| D5 | 2026-05-07 Session 5 dogfood (FINDING-1) | set operator partial-match semantics | partial-dict `expected` items canonicalised as different elements from full extractor records — false positive on `includes_*` / `subset_of` / `superset_of`, **false negative (CI bypass) on `excludes_all`** | **解決** | PR #65 (CSCI-35c) — Match Schema partial-record matching + flat-projection aliases + `evidence.matched`; schema_version v4→v5 |
| D6 | 2026-05-28 real-PR complexity dogfood (FINDING-F1) | vacuous PASS (extractor coverage gap) — **重複・関連 = sibling of D4** | nested function bodies are excluded from `ComplexityEntry` by `python_complexity_extractor` spec (`api_surface` parity); refactor that nests outer-function body into nested helpers reports large CC drop while real complexity is unchanged | **未解決** | Candidate paths: (a) `docs/target_yaml_guide.md` new Hazard 4 + `ADVISORY-D6` detector mirroring D4; (b) extractor spec change to emit nested-function entries (long-term, schema-impacting). Reproduction: langgraph PR #3700 (8/1 vacuous PASS in real-PR pass) |
| D6 | 2026-05-28 real-PR complexity dogfood (FINDING-F1) | vacuous PASS (extractor coverage gap) — **重複・関連 = sibling of D4** | nested function bodies are excluded from `ComplexityEntry` by `python_complexity_extractor` spec (`api_surface` parity); refactor that nests outer-function body into nested helpers reports large CC drop while real complexity is unchanged | **解決** | `docs/target_yaml_guide.md` Hazard 4 + `ADVISORY-D6` detector (`authoring/hazards.py::detect_d6`, candidate path (a), mirrors D4's diff-aware contract) — fires when a verdict-participating `complexity_delta` constraint meets nested-def growth in the in-scope candidate diff (`--baseline-rev` ↔ `--candidate-rev`); growth signal computed per file via `authoring/nested_defs.py`, syntax-error sides skipped fail-silent. Path (b) (extractor emits nested-function entries) remains a long-term, schema-impacting option, not pursued. Reproduction: langgraph PR #3700 (8/1 vacuous PASS in real-PR pass) |
| D7 | 2026-05-28 real-PR complexity dogfood (FINDING-F2) | authoring mismatch (operator / constraint pairing) | `extract-method` refactor is mathematically guaranteed to **micro-increase cyclomatic** (each extracted function adds base 1), even with `_` prefix discipline and api_surface preserved. Cognitive is the metric that drops. Authors declaring `complexity_delta.cyclomatic ≤ 0` for extract-method refactors hit a structural false-FAIL | **未解決** | Candidate paths: (a) authoring guide section "Choosing complexity metric per refactor pattern" recommending `cognitive_delta` for extract-method; (b) future `ADVISORY-D7` detector emitted when a `change.primary_kind=refactor` target uses `cyclomatic_delta ≤ 0` and the diff matches extract-method shape. Low priority: this is authoring advice, not a CI integrity hazard |
| D8 | 2026-06-07 scale + security dogfood (SCA gap) | SCA sensor dependency-source discovery gap | SSP SCA auto-discovery (`_requirements_file` in `src/semantic_ci_code/cli/commands/ssp.py`) only found `requirements.txt` at repo root; the `--locked` fallback only accepted `pylock.toml` / requirements lockfiles. PEP 621 pyproject-only projects (litellm) and `pdm.lock` projects (pdm) declared deps in unrecognised formats → `pip-audit --locked .` errors "no lockfiles found" → empty JSON → adapter degraded to `unknown` (exit 3). Correct graceful degradation (no silent false PASS, honours `unknown > fail > pass`) but a real usability gap that blocked SCA on most modern Python projects | **解決** | CSCI-55 — dependency source discovery now recognises `requirements.txt`, `pylock.toml` / `pylock.*.toml`, `uv.lock`, `pdm.lock`, `poetry.lock`, and static PEP 621 `[project].dependencies`; lock sources are converted deterministically to pinned temp requirements, optional/non-default-group/marker-inactive packages are filtered, and malformed recognized sources fail closed to SSP `unknown` |
| D8 | 2026-06-07 scale + security dogfood (SCA gap) | SCA sensor dependency-source discovery gap | SSP SCA auto-discovery (`_requirements_file` in `src/semantic_ci_code/cli/commands/ssp.py`) only found `requirements.txt` at repo root; the `--locked` fallback only accepted `pylock.toml` / requirements lockfiles. PEP 621 pyproject-only projects (litellm) and `pdm.lock` projects (pdm) declared deps in unrecognised formats → `pip-audit --locked .` errors "no lockfiles found" → empty JSON → adapter degraded to `unknown` (exit 3). Correct graceful degradation (no silent false PASS, honours `unknown > fail > pass`) but a real usability gap that blocked SCA on most modern Python projects | **解決** | CSCI-55 / PR #151 — dependency source discovery now recognises `requirements.txt`, `pylock.toml` / `pylock.*.toml`, `uv.lock`, `pdm.lock`, `poetry.lock`, and static PEP 621 `[project].dependencies`; lock sources are converted deterministically to pinned temp requirements, optional/non-default-group/marker-inactive packages are filtered, and malformed recognized sources fail closed to SSP `unknown` |

## Reading order

Expand All @@ -37,8 +37,8 @@ Status taxonomy:
## Classification at a glance

- **重複・関連 pairs**: D4 ↔ D6 (both are "vacuous PASS" via extractor coverage gap, distinct mechanism — D4 is "diff outside Python scope", D6 is "diff inside scope but inside nested function")
- **解決 (6 of 8)**: D1, D2, D3, D4, D5, D8
- **未解決 (2 of 8)**: D6 (mitigation path open), D7 (authoring advice, low priority)
- **解決 (7 of 8)**: D1, D2, D3, D4, D5, D6, D8
- **未解決 (1 of 8)**: D7 (authoring advice, low priority)
- **observation-only (not a D#)**: F6 (pattern-SAST logic-vuln blindspot) — **UNTESTED HYPOTHESIS, not a demonstrated observation in the 2026-06-07 pass**: the Semgrep registry rulesets returned HTTP 403, so Semgrep ran with 0 loaded rules over 0 paths and produced no valid SAST measurement. F6 records the *a-priori* expectation that deterministic SAST misses semantic / business-logic vulns, cross-linked to Phase H (`docs/llm_sensor_adapter_planning.md`) as **motivation** — it is **not** empirically validated by this pass. Recorded in `docs/dogfooding_scale_and_security.md` (which now carries a validity warning + repro note for redoing the SAST sub-pass under a network policy allowing `semgrep.dev`). Distinct from the demonstrated observations of the same pass: real vulns merged-then-fixed (git evidence) and SCA clean-on-litellm (pip-audit positive-controlled with `jinja2==2.11.2` → 5 CVEs)

## Source pass index
Expand Down
4 changes: 2 additions & 2 deletions docs/exit_codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ git errors (`CompileError` on `target.yaml`, git revision resolution
failure when `--baseline-rev` / `--candidate-rev` is given, git
unavailable when explicitly required) exit 3. Internal bugs exit 4. When
neither `--baseline-rev` nor `--candidate-rev` is given and git is
unavailable or no baseline can be resolved, ADVISORY-D4 is silently
skipped rather than failing. There is no `--strict-advice` flag; CI that
unavailable or no baseline can be resolved, ADVISORY-D4 and ADVISORY-D6
are silently skipped rather than failing. There is no `--strict-advice` flag; CI that
wants to gate on advisory presence should consume `--format json` and
apply a workflow-level policy. Silent success on bad input is forbidden —
the advisor surface only suppresses the verdict step, not the input
Expand Down
4 changes: 2 additions & 2 deletions docs/json_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,12 +318,12 @@ compile-repair, or validate-plan envelopes. The shape is pinned by
|---|---|
| `schema_version` | Always `"advisory-1"`. |
| `subcommand` | Always `"target-doctor"`. |
| `advisories[].code` | One of `ADVISORY-D1`, `ADVISORY-D3`, `ADVISORY-D4`, `ADVISORY-I1`, `ADVISORY-P1`, `ADVISORY-P2`, `ADVISORY-S1`. |
| `advisories[].code` | One of `ADVISORY-D1`, `ADVISORY-D3`, `ADVISORY-D4`, `ADVISORY-D6`, `ADVISORY-I1`, `ADVISORY-P1`, `ADVISORY-P2`, `ADVISORY-S1`. |
| `advisories[].severity` | Always `"info"` — the Advisor surface never participates in the verdict (`docs/code_semantic_ci_design.md §23.3.1`). |
| `advisories[].message` | Human-readable explanation of the hazard. |
| `advisories[].evidence` | Per-advisory diagnostic fields (e.g. `constraint_id`, `target`, `package_root`, `files_touched_count`). |

Advisories are emitted in canonical order (D1 → D3 → D4 → I1 → P1 → P2 → S1)
Advisories are emitted in canonical order (D1 → D3 → D4 → D6 → I1 → P1 → P2 → S1)
with `constraint_id` as the within-code tiebreak so output is byte-identical
across runs. Advisory presence does not change the exit code — see
`docs/exit_codes.md`.
Expand Down
35 changes: 35 additions & 0 deletions docs/target_yaml_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,41 @@ non-Python artifacts (see `design.md §13` on out-of-scope items). It is
`semantic-ci target-doctor` (Brief 8 / CSCI-43) detects this hazard as
`ADVISORY-D4` once it lands.

## Hazard 4 — Nested functions are invisible to complexity numbers (D6)

`python_complexity_extractor` only emits entries for the `api_surface`
parity subset: module-level functions and direct methods of module-level
classes. When it walks a function body it **stops descent at nested
`def` boundaries** — a nested helper contributes 0 to the enclosing
function's cyclomatic/cognitive number, and the helper itself is never
emitted as its own entry.

The consequence for `complexity_delta` constraints is the D6 sibling of
Hazard 3's vacuous PASS (observed on a real PR in
`docs/dogfooding_real_pr_complexity.md` FINDING-F1): a refactor that moves
an outer function's body into function-nested helpers reports a large
complexity *drop* while the real complexity is unchanged — it has only
been displaced below the extractor's horizon. A lock like
`complexity_delta.cyclomatic less_than_or_equal 0` passes, and the verdict
silently endorses the displacement.

**What this means for authors:**

- A green complexity verdict on a refactor that introduced nested helpers
is weaker evidence than the number suggests. Check whether the helpers
should be **module-level functions** (possibly `_`-prefixed) — those are
extracted, complexity-counted, and constrained.
- This is the extractor behaving as specified (`api_surface` emission
parity, `design.md` CSCI-8), not a counting bug. The blind spot is the
declared trade-off for a deterministic, owned formula.

`semantic-ci target-doctor` detects this hazard as `ADVISORY-D6` when
`--baseline-rev` / `--candidate-rev` are given: it fires when the target
declares a verdict-participating `complexity_delta` constraint and the
candidate diff grows the nested-def count in an in-scope Python file.
Growth is a heuristic displacement signal, not proof — review the diff
rather than treating the advisory as a verdict.

## Constraint Authoring Tips

### Pick `kind` deliberately
Expand Down
1 change: 1 addition & 0 deletions src/semantic_ci_code/authoring/advisory.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"ADVISORY-D1",
"ADVISORY-D3",
"ADVISORY-D4",
"ADVISORY-D6",
"ADVISORY-I1",
"ADVISORY-P1",
"ADVISORY-P2",
Expand Down
Loading
Loading