Skip to content

fix: warn when apm.yml is missing but APM artifacts exist (closes #1056)#1255

Open
sergio-sisternes-epam wants to merge 4 commits into
mainfrom
fix/1056-missing-apm-yml-policy-bypass
Open

fix: warn when apm.yml is missing but APM artifacts exist (closes #1056)#1255
sergio-sisternes-epam wants to merge 4 commits into
mainfrom
fix/1056-missing-apm-yml-policy-bypass

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Fixes #1056

Problem

When apm.yml is absent, apm audit --ci exits 0 with no warnings. An attacker with write access could delete apm.yml to bypass all policy and baseline checks.

Approach

Warn (but don't fail) when apm.yml is absent but evidence of prior APM usage exists (.apm/ directory or apm.lock.yaml). This preserves non-APM project compatibility while detecting deletion-based bypass.


Draft PR -- implementation in progress.

sergio-sisternes-epam pushed a commit that referenced this pull request May 10, 2026
Closes #1056

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label May 10, 2026
@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as ready for review May 10, 2026 22:32
Copilot AI review requested due to automatic review settings May 10, 2026 22:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens apm audit --ci against a deletion-bypass scenario by emitting a passing-but-attention-worthy manifest-missing check when apm.yml is absent while prior-APM artifacts are present (e.g., .apm/ or a lockfile). This preserves compatibility for repos that never adopted APM while surfacing suspicious “used-to-be-an-APM-project” states.

Changes:

  • Add a manifest-missing check to run_baseline_checks() when apm.yml is missing but APM artifacts exist.
  • Add unit tests covering the warning emission conditions (no artifacts / .apm present / lockfile present / manifest present).
  • Add a changelog entry for the new warning behavior and register the check name in the CI/policy models map.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/apm_cli/policy/ci_checks.py Adds the manifest-missing warning check on early-return paths when apm.yml is missing but artifacts exist.
tests/unit/policy/test_ci_checks.py Adds unit tests validating when the warning is (and isn’t) emitted.
src/apm_cli/policy/models.py Registers manifest-missing in the check-to-artifact map.
CHANGELOG.md Documents the new manifest-missing warning behavior under Unreleased.

Comment thread src/apm_cli/policy/ci_checks.py
Comment thread CHANGELOG.md Outdated
@github-actions
Copy link
Copy Markdown

APM Review Panel: needs_rework

fix: detect manifest-bypass condition (apm.yml missing, artifacts present) -- currently warns but exits 0 in --ci mode, leaving the #1056 bypass unblocked in CI pipelines

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

PR #1255 correctly identifies and surfaces a meaningful security condition: APM artifacts exist on disk but apm.yml is absent, a pattern that can indicate deliberate manifest suppression to bypass policy enforcement. The detection logic is clean, the CHANGELOG entry is accurate, and four unit tests confirm the internal function behaves correctly. The PR moves the project in the right direction and closes the visibility gap that #1056 described. However, the panel converges on a single structural defect that prevents the PR from actually closing the vulnerability it targets: passed=True on the new check means CIAuditResult.passed remains True (computed as all(c.passed for c in checks)), so apm audit --ci exits 0 when the bypass condition is detected. The security goal is detection-plus-blocking in CI, not detection-only; the current implementation delivers detection-only in all modes.

The oss-growth-hacker argues that warn-not-fail is adoption-correct, and that argument has merit -- but only for interactive (non-CI) usage. The devx-ux-expert makes the decisive point: every analogous anomaly check in the codebase uses passed=False, and the industry convention (npm audit, pip-audit) is exit-nonzero on findings in CI mode. The PR body's "warn but don't fail" framing appears to have been applied uniformly when the correct design is mode-sensitive: warn (passed=True) in interactive mode, fail (passed=False) in --ci mode. This is not a philosophical disagreement between panelists -- it is a gap between the PR's stated intent and its implementation in the one context where the fix is load-bearing. Until passed=False is set for --ci invocations (or a --allow-missing-manifest escape valve is provided), the bypass described in #1056 remains exploitable in any CI pipeline running apm audit --ci.

Secondary signals are advisory but important for the next sprint: SARIF output silently omits the new check (confirmed by both cli-logging-expert and python-architect), four documentation sites still enumerate "7 baseline checks" (doc-writer), the complete-artifact-wipe vector (delete all three artifacts) is still fully silent (supply-chain-security-expert), and no CLI-level integration test certifies the user-visible command surface (test-coverage-expert). None of these block a corrected version of this PR from shipping, but the SARIF gap shares the same root cause as the exit-code defect and should be resolved in the same pass.

Dissent. The only substantive disagreement is between the oss-growth-hacker (warn-not-fail is adoption-correct, ship as-is with docs) and supply-chain-security-expert + devx-ux-expert + cli-logging-expert (passed=True in --ci mode is a defect, not a design choice). The resolution is mode-sensitive behavior, not a binary choice: retain the warning rendering outside --ci, fail inside --ci. No other panelist disagreement of substance.

Aligned with: Secure by default -- violated in --ci mode; passed=True exits 0 on the bypass condition; fix requires mode-sensitive logic. Governed by policy -- detection must also enforce the gate in --ci, not only surface the condition. Pragmatic as npm -- npm audit and pip-audit exit 1 on findings in CI; APM should match this convention. Portable by manifest -- the check reinforces manifest-as-source-of-truth by flagging its absence. OSS community driven -- four documentation sites enumerate "7 baseline checks" and will contradict shipped behavior on merge.

Growth signal. Once the exit-code defect is corrected, PR #1255 becomes a clean "secure by default" trust beat suitable for amplification in release notes and the enterprise adoption playbook. The CHANGELOG entry is already well-voiced; leading with the threat scenario rather than the flag name is worth a one-line edit. The warn-not-fail behavior in interactive mode should be explicitly documented in the audit command reference as intentional policy -- enterprise evaluators will ask why it does not always fail, and a one-sentence rationale prevents that from becoming a trust objection.

Panel summary

Persona B R N Takeaway
Python Architect 0 0 2 Minimal, well-scoped change; inline detection logic is appropriate; one redundant Path variable worth a nit.
CLI Logging Expert 0 2 1 Warning renders as green [+] (indistinguishable from a passing check) and is silently dropped from SARIF -- defeating the security goal for both human and CI-agent audiences.
DevX UX Expert 0 4 1 passed=True on a security-relevant warning is semantically broken; CI mode must exit non-zero; message gives no remediation action.
Supply Chain Security Expert 1 1 1 passed=True keeps exit 0 in --ci mode, preserving the exact bypass #1056 describes; complete-wipe vector still silent.
OSS Growth Hacker 0 2 2 Strong security-trust signal with good CHANGELOG copy; warn-not-fail is adoption-correct; minor missed amplification opportunity in release narrative.
Doc Writer 0 3 2 CHANGELOG entry accurate and well-voiced; 4 docs lag the new check -- cli-commands.md, ci-cd.md (x2), and governance.md each enumerate "7 baseline checks" without the new one.
Test Coverage Expert 0 1 1 Four unit tests cover the internal function; no integration-tier test exercises apm audit --ci via the CLI with apm.yml absent -- the user-visible command surface is uncertified.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Supply Chain Security Expert] (blocking-severity) Set passed=False (or mode-sensitive logic) for manifest-missing when --ci is active, so apm audit --ci exits non-zero on bypass detection. -- This is the load-bearing fix for Policy: missing apm.yml silently skips all policy and baseline checks #1056. Without it the bypass is detected but not blocked; the CI gate still passes. Highest-signal defect in the panel.
  2. [CLI Logging Expert] Fix SARIF serialisation to surface the manifest-missing finding -- either set passed=False or emit level: warning for the new check name. -- Shares the root cause with the exit-code defect; SARIF is the primary integration point for GitHub Advanced Security and enterprise CI dashboards. Should ship in the same PR.
  3. [Test Coverage Expert] Add an integration-with-fixtures test invoking apm audit --ci via CLI argv with apm.yml absent and artifacts present, asserting exit code 1. -- Unit tests certify the internal function; no test certifies the user-visible command surface. A CLI-level test is the regression trap that prevents the exit-code fix from silently reverting.
  4. [Doc Writer] Update cli-commands.md (line 543), ci-cd.md (lines 71 and 151), the exit-code table, and packages/apm-guide governance.md to enumerate the new check and note that exit 0 may include advisory warnings. -- Four documentation sites will contradict shipped behavior on merge; enterprise evaluators will distrust the docs or the tool.
  5. [Supply Chain Security Expert] File a tracked issue for the complete-artifact-wipe vector (delete apm.yml + .apm/ + apm.lock.yaml) which remains fully silent after this PR. -- Not a blocker for this PR, but the issue framing ("bypass all policy and baseline checks") implies this gap should be tracked before Policy: missing apm.yml silently skips all policy and baseline checks #1056 is fully closed.

Architecture

classDiagram
    direction LR
    class run_baseline_checks {
        <<Pure>>
        +project_root: Path
        +fail_fast: bool
        +run() CIAuditResult
    }
    class CheckResult {
        <<ValueObject>>
        +name: str
        +passed: bool
        +message: str
        +details: list[str]
    }
    class CIAuditResult {
        <<ValueObject>>
        +checks: list[CheckResult]
        +passed: bool
        +to_json() dict
        +to_sarif() dict
    }
    class _CHECK_ARTIFACT_MAP {
        <<Registry>>
        +manifest-missing: apm.yml
    }
    run_baseline_checks ..> CheckResult : creates
    run_baseline_checks ..> CIAuditResult : returns
    CIAuditResult *-- CheckResult : aggregates
    CIAuditResult ..> _CHECK_ARTIFACT_MAP : reads for SARIF
    class run_baseline_checks:::touched
    class _CHECK_ARTIFACT_MAP:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A([apm audit --ci]) --> B["run_baseline_checks\nci_checks.py:502"]
    B --> C{apm.yml exists?}
    C -- yes --> D["parse APMPackage\nci_checks.py:526"]
    D --> E["_check_lockfile_exists\nci_checks.py:539"]
    C -- no --> F["FS: check .apm/ dir\nci_checks.py:552"]
    F --> G{".apm/ or apm.lock.yaml\nexists?"}
    G -- yes --> H["append manifest-missing\nCheckResult passed=True\nci_checks.py:555"]
    G -- no --> I["return result\nno warning"]
    H --> I
    E --> J{lockfile check passed?}
    J -- no --> I
    J -- yes --> K["LockFile.read\nci_checks.py:568"]
    K --> L["check ref-consistency /\ndeployed-files / no-orphans / ..."]
    L --> I
Loading

Recommendation

Do not merge in current form. The single required change is mechanical and low-risk: passed=True must become passed=False (or a mode-sensitive equivalent) for the manifest-missing check when --ci is active. Without this, the PR detects the #1056 bypass condition but does not block it -- apm audit --ci still exits 0, and any CI pipeline relying on APM for manifest-enforcement remains exposed. The growth-hacker's warn-not-fail argument is valid for interactive mode and should be preserved: retain the warning rendering outside --ci, fail inside --ci. Once that change is made, a CLI-level integration test is added (follow-up 3), the SARIF gap is resolved (follow-up 2), and the four documentation sites are updated (follow-up 4), this PR is clean to merge and will genuinely close #1056. Estimated rework scope: small -- one boolean, one test fixture, four doc line-edits.


Full per-persona findings

Python Architect

  • [nit] lock_file variable in the new block shadows the conceptually equivalent lockfile_path (assigned two lines earlier in the same function) at src/apm_cli/policy/ci_checks.py:553
    lockfile_path is already computed via get_lockfile_path() a few lines above the guard. The new block re-derives the same path as project_root / 'apm.lock.yaml' under a different name. This is harmless but creates a confusing dual identity for the same file path concept and will bite the next person who changes the lockfile location.
    Suggested: Use lockfile_path (already in scope) instead of introducing lock_file. The condition becomes if apm_dir.exists() or lockfile_path.exists():.

  • [nit] Warn-only CheckResult uses passed=True, making SARIF output silently skip it at src/apm_cli/policy/ci_checks.py:558
    to_sarif() only emits results for checks where passed is False. A manifest-missing warning with passed=True will never appear in GitHub Code Scanning output. If intentional, document it; if SARIF visibility is desired, set passed=False.
    Suggested: Add a comment # passed=True: advisory only; not surfaced in SARIF if intentional, or set passed=False if SARIF visibility of this warning is desired.

CLI Logging Expert

  • [recommended] passed=True renders green [+] -- security-relevant condition is visually silent at src/apm_cli/commands/audit.py
    _render_ci_results colours every CheckResult by a single boolean: green [+] if passed, red [x] if not. There is no yellow/warning path. The new manifest-missing entry with passed=True will appear in the compliance table as a green check mark alongside all other healthy checks -- indistinguishable from a passing result.
    Suggested: Emit this condition as a standalone _rich_warning() line before the CheckResult table using STATUS_SYMBOLS['warning'], or surface it only when --verbose is set.

  • [recommended] SARIF output silently omits this finding -- CI/Code Scanning pipelines are blind to it at src/apm_cli/policy/models.py
    to_sarif() filters on if not check.passed. Because the new CheckResult has passed=True, it is never serialised into SARIF results or rules. Teams using apm audit --ci --format sarif and uploading to GitHub Code Scanning will receive zero signal about a possibly deleted apm.yml.
    Suggested: In to_sarif(), also emit entries with level: 'warning' for checks that passed but carry a notable message, or change passed=False and set level='warning' in SARIF serialisation for this check name.

  • [nit] Message missing actionable remediation hint at src/apm_cli/policy/ci_checks.py
    The message ends with "-- this may indicate a deleted manifest" but gives the user no next step. APM console convention requires a concrete follow-up.
    Suggested: "apm.yml is missing but APM artifacts (.apm/ or apm.lock.yaml) were found -- possible deleted manifest. Run 'apm init' to recreate or restore from source control."

DevX UX Expert

  • [recommended] passed=True on a warning check is a semantic contradiction that breaks CI parsing at src/apm_cli/policy/ci_checks.py:558
    Every analogous check in the codebase (manifest-parse, unmanaged-files, etc.) uses passed=False to mean "something is wrong." Introducing a new convention -- passed-but-actually-a-warning -- without a dedicated status field breaks the contract silently. The test suite even asserts check.passed is True, baking the wrong semantic in.

  • [recommended] apm audit --ci should exit non-zero when a manifest-bypass condition is detected at src/apm_cli/policy/ci_checks.py:548
    In CI, exit 0 = green build. A warning that surfaces in terminal output but doesn't flip the exit code will be invisible in most CI log viewers. npm audit and pip-audit both exit 1 on findings. The correct UX in --ci mode is to exit non-zero by default and provide an --allow-missing-manifest escape hatch for teams that legitimately have no apm.yml.

  • [recommended] Warning message gives no next action -- violates the "failure mode is the product" principle at src/apm_cli/policy/ci_checks.py:560
    The message names the problem but stops there. manifest-parse already sets the gold standard: "Cannot parse apm.yml -- fix the YAML syntax error in apm.yml and re-run." This check should match that pattern.
    Suggested: "apm.yml is missing but APM artifacts (.apm/ or apm.lock.yaml) were found. Restore it with 'apm init', or commit the missing file. If intentional, re-run with --allow-missing-manifest."

  • [recommended] No --strict or --allow-missing-manifest flag documented or implemented
    Security-adjacent warnings in CI tooling need a two-sided escape valve. Teams that want zero-tolerance get no mechanism; teams with legitimate missing-manifest repos get no suppression path. cargo audit has --deny warnings; npm audit has --audit-level.

  • [nit] "this may indicate" is unnecessarily hedge-y for a deterministic condition at src/apm_cli/policy/ci_checks.py:563
    The condition is deterministic: APM artifacts exist and apm.yml does not. Prefer: "apm.yml is missing. APM artifacts were found (.apm/, apm.lock.yaml), which suggests it was deleted or not committed."

Supply Chain Security Expert

  • [blocking] manifest-missing fires with passed=True, so --ci still exits 0 on bypass at src/apm_cli/policy/ci_checks.py:557
    CIAuditResult.passed is all(c.passed for c in self.checks) (models.py:62), so the aggregate result remains passing and apm audit --ci exits 0. Policy: missing apm.yml silently skips all policy and baseline checks #1056 states the threat: "An attacker with write access could delete apm.yml to bypass all policy and baseline checks." This PR detects the condition but does not block it -- the CI gate still passes.
    Suggested: Change passed=True to passed=False for the manifest-missing check (or gate it on a ci_mode flag). A warning-only path could exist for non-CI invocations if ergonomics require it, but under --ci this must be a hard failure.

  • [recommended] Complete-artifact-wipe vector remains fully silent at src/apm_cli/policy/ci_checks.py:550-566
    The detection heuristic requires that .apm/ OR apm.lock.yaml is still present. If an attacker deletes apm.yml AND .apm/ AND apm.lock.yaml, the early-return fires with zero new checks appended -- no warning, no failure, exit 0.
    Suggested: Document the known gap explicitly in CHANGELOG and SECURITY.md, or extend detection to cover the all-deleted case.

  • [nit] lock_file variable duplicates lockfile_path at src/apm_cli/policy/ci_checks.py:553
    lockfile_path is already in scope via get_lockfile_path(). Reuse it to avoid fragile duplication.

OSS Growth Hacker

  • [recommended] CHANGELOG entry is solid but undersells the trust story at CHANGELOG.md
    The entry accurately describes the mechanics but buries the user-facing value. For release notes and social copy, the hook should lead with the threat scenario: "APM now detects and warns when apm.yml was removed while a lockfile or .apm/ directory still exists -- closing a silent CI bypass."

  • [recommended] Warn-not-fail stance is adoption-correct but needs explicit documentation
    The "warn but don't fail" choice is the right call for adoption in interactive mode. This is a positioning asset matching the "secure by default, governed by policy" tagline. Recommend calling this out explicitly in the docs so users understand the intentional stance.

  • [nit] Lead with the user scenario in CHANGELOG before naming the flag name.

  • [nit] Missed story angle: this is a supply-chain trust beat suitable for a release post.

Auth Expert -- inactive

No auth surface touched; changed files (CHANGELOG.md, src/apm_cli/policy/ci_checks.py, src/apm_cli/policy/models.py, tests/unit/policy/test_ci_checks.py) cover only policy/audit CI-check logic and do not affect authentication, token management, credential resolution, or remote-host authorization.

Doc Writer

  • [recommended] docs/src/content/docs/reference/cli-commands.md:543 enumerates "the 7 baseline checks" -- manifest-missing absent
    Readers relying on this reference to understand what apm audit --ci checks will not know the new check exists. Also consider replacing the hardcoded count with "the baseline checks" to avoid incrementing it again next time.

  • [recommended] docs/src/content/docs/integrations/ci-cd.md (lines 71 and 151) hardcodes "seven baseline lockfile checks" / "7 baseline checks" -- manifest-missing absent
    After this PR the count is 8 (or 7+1 advisory). Add a note that manifest-missing is emitted as a warning (exit 0) when apm.yml is absent but APM artifacts are present.

  • [recommended] docs/src/content/docs/reference/cli-commands.md exit-code table collapses "clean pass" and "pass with advisory warnings" into exit 0
    The table has two rows: exit 0 = "All checks passed", exit 1 = "One or more checks failed." Add a note: "Exit 0 may include advisory warnings (e.g., manifest-missing). Always inspect stderr output even when the exit code is 0."

  • [nit] CHANGELOG phrasing slightly long but acceptable; optional tighten: "...closes a silent-bypass vector for all baseline checks."

  • [nit] packages/apm-guide/.apm/skills/apm-usage/governance.md:107 baseline check list omits manifest-missing.
    Suggested: Add manifest-missing with description: "advisory warning when apm.yml is absent but .apm/ or apm.lock.yaml exist."

Test Coverage Expert

  • [recommended] Unit tests call run_baseline_checks() directly; no CLI-level integration test for apm audit --ci with absent apm.yml but present artifacts at tests/unit/policy/test_ci_checks.py
    The four new tests never invoke the apm audit --ci argv surface. Grepped tests/integration/ for manifest.missing, manifest_absent, no_apm_yml, absent.*apm -- zero matches. The unit tests are valid and useful for defending the internal logic; they are necessary but not sufficient for the CLI promise.
    Proof (missing at integration-with-fixtures): tests/integration/test_drift_check.py::test_c3_no_manifest_but_apm_dir_ci_audit_warns_and_exits_zero -- proves: apm audit --ci run via CLI argv exits 0 (or 1 after fix) and surfaces a user-readable warning when .apm/ exists but apm.yml is absent. [devx, governed-by-policy]

  • [nit] test_apm_dir_triggers_warning does not assert the message contains a human-actionable hint (e.g., "run apm init")
    The test asserts .apm/ is in the message but not that a next-action hint is present. One additional assertion would harden the user-promise contract.
    assert ".apm/" in check.message or "apm.lock.yaml" in check.message

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1255 · ● 2.7M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 10, 2026
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label May 11, 2026
@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 11, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel: ship_with_followups

fix: apm audit --ci now exits non-zero when apm.yml is deleted but APM artifacts remain, closing a silent policy-bypass gap (closes #1056)

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

All five active panelists agree the fix is real and valuable: deleting apm.yml to silently disable policy checks in CI is a meaningful threat surface, and this PR closes it. The security property is correctly scoped -- hard failure in --ci mode, advisory warn in interactive -- and the implementation is minimal and consistent with existing advisory-check precedent. The fix ships value as written.

Two findings need attention before this is fully trustworthy. First, the devx-ux-expert and test-coverage-expert converge on the same structural gap: the interactive apm audit (no --ci) path does not call run_baseline_checks, so the warn-but-pass promise stated in the PR body is undelivered. This is not a nit -- it is a broken contract between the PR description and the shipped behavior. Second, the test-coverage-expert found that the CLI exit-code contract for apm audit --ci with no apm.yml has no integration-with-fixtures test. For a security-hardening fix, the user-observable promise (exit_code == 1) must be certified at the integration tier, not just unit-tested at the policy layer. Both gaps are follow-up candidates, but the interactive-path gap is close to a blocker -- if the PR body is wrong, either the code or the description must be corrected in this PR.

The remaining signals are clean follow-ups: the python-architect's ci_mode layering concern is a legitimate design debt but not a blocker; the doc-writer identified stale '7 baseline checks' counts in two docs and a missing policy-reference row, which are concrete and should land with or shortly after this PR; the cli-logging-expert and devx-ux-expert both flag the missing actionable remediation hint in the warning message, a low-cost high-trust-signal fix that should be bundled; the supply-chain finding that wiping all four artifacts bypasses the check is a scope expansion, not a regression -- it belongs in a follow-up issue.

Dissent. No meaningful panelist disagreement. The supply-chain finding (wipe all artifacts) and the python-architect's ci_mode layering concern point in opposite directions on severity but are not in conflict -- one is a scope boundary, the other is design debt. Both are correctly classified as recommended. The only unresolved question is whether the interactive-path gap is a blocker or follow-up; CEO weights it as near-blocker because it is a promise-code mismatch that must be resolved in this PR, even if the resolution is a documentation correction rather than a code change.

Aligned with: Secure by Default (CI mode now fails hard on manifest-missing; the security property is opt-out via the --ci flag rather than opt-in, which is the correct default for a governed pipeline check). Governed by Policy (manifest-missing is a named check in the policy layer, consistent with how all other baseline checks are registered and reported, preserving auditability). Pragmatic as npm (the fix adds zero new required configuration; existing repos get the protection automatically on the next apm audit --ci run).

Growth signal. PR #1255 is the first explicit security-hardening story in the APM changelog. Recommend opening a dedicated 'Security' section in CHANGELOG.md so these fixes accumulate into a visible trust track record. The next release post should headline 'policy bypass gap closed' -- this framing is shareable in DevSecOps communities and positions APM as a tool that takes supply-chain governance seriously, not just package resolution.

Panel summary

Persona B R N Takeaway
Python Architect 0 1 1 Solid minimal fix; advisory-warning pattern is already established; one import inconsistency worth a nit.
CLI Logging Expert 0 2 2 Message text is functional but missing actionable fix hint; verify renderer applies [!] correctly.
DevX UX Expert 0 2 2 Check fires correctly, but the warning message lacks a concrete next-action; users see a diagnosis with no prescription.
Supply Chain Security Expert 0 2 1 Fix closes the apm.yml deletion bypass for --ci but residual gap exists: deleting all 4 artifacts still passes silently.
OSS Growth Hacker 0 1 1 Security fix is real and trust-building; CHANGELOG entry buries the headline -- leading with 'silent bypass' weakens the story.
Doc Writer 0 3 1 CHANGELOG entry is accurate; CLI reference and ci-cd.md hard-code '7 baseline checks' that undercounts by 1; policy-reference check table is missing manifest-missing.
Test Coverage Expert 0 1 2 9 unit tests cover the policy layer well; the CLI exit-code contract for apm audit --ci with no apm.yml is uncertified at the integration-with-fixtures floor.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [DevX UX Expert] (blocking-severity) Interactive apm audit (no --ci) never emits the manifest-missing warning -- promise-code mismatch must be resolved in this PR: either wire the check into the interactive path or explicitly scope the feature to --ci only and update the PR description and --help text accordingly.
  2. [Test Coverage Expert] Add integration-with-fixtures test: apm audit --ci exits non-zero with .apm/ present and apm.yml absent -- the user-visible security promise (exit_code == 1 + 'manifest-missing' in output) is only certified at the unit/policy layer; a refactor dropping ci_mode=True from audit.py would silently break the guarantee.
  3. [CLI Logging Expert + DevX UX Expert] Append actionable remediation hint to warning message: '-- restore apm.yml from version control or run "apm init" to recreate it' -- every comparable check in ci_checks.py ends with a concrete command; this one gives a diagnosis with no prescription.
  4. [Doc Writer] Update cli-commands.md and ci-cd.md '7 baseline checks' count to 8; add manifest-missing row to policy-reference.md check table -- three doc locations now undercount the baseline checks, breaking the single-source-of-truth contract for operators.
  5. [Supply Chain Security Expert] Open a follow-up issue: an attacker who deletes apm.yml AND all lock artifacts bypasses the new check entirely -- this is a scope expansion, not a regression from this PR, and belongs in a tracked issue.

Architecture

classDiagram
    direction LR

    class run_baseline_checks {
        <<PolicyGateway>>
        +project_root Path
        +fail_fast bool
        +ci_mode bool
        +returns CIAuditResult
    }

    class CheckResult {
        <<ValueObject>>
        +name str
        +passed bool
        +message str
        +details list[str]
    }

    class CIAuditResult {
        <<ValueObject>>
        +checks list[CheckResult]
        +passed bool
        +failed_checks list[CheckResult]
        +to_json() dict
    }

    class _audit_ci_gate {
        <<CommandLayer>>
        +cfg AppConfig
        +no_fail_fast bool
        +calls run_baseline_checks
    }

    class _check_includes_consent {
        <<AdvisoryCheck>>
        +always passed=True
        +message str
    }

    class manifest_missing_check {
        <<AdvisoryCheck>>
        +passed not ci_mode
        +name manifest-missing
    }

    _audit_ci_gate ..> run_baseline_checks : calls ci_mode=True
    run_baseline_checks *-- manifest_missing_check : appends
    run_baseline_checks *-- _check_includes_consent : appends
    run_baseline_checks ..> CIAuditResult : returns
    CIAuditResult *-- CheckResult : aggregates
    manifest_missing_check ..> CheckResult : is-a
    _check_includes_consent ..> CheckResult : is-a

    class run_baseline_checks:::touched
    class manifest_missing_check:::touched
    class _audit_ci_gate:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A(["apm audit --ci / _audit_ci_gate()"])
    B["run_baseline_checks(project_root, fail_fast=True, ci_mode=True)"]
    C{"apm.yml exists?"}
    D["_check_lockfile_exists() / check: lockfile-exists"]
    E{"lockfile exists?"}
    F{"apm/ exists OR apm.lock.yaml exists OR apm.lock exists?"}
    G["append CheckResult / name=manifest-missing / passed=False in CI"]
    H["return CIAuditResult with manifest-missing check"]
    I["continue remaining checks (lockfile-hash, drift, etc.)"]
    J["return CIAuditResult (all checks complete)"]
    K["no APM artifacts found / return result (empty checks)"]

    A --> B
    B --> C
    C -- "no" --> F
    C -- "yes" --> D
    D --> E
    E -- "no" --> K
    E -- "yes" --> I
    I --> J
    F -- "yes: artifacts found" --> G
    F -- "no: clean project" --> K
    G --> H
Loading
sequenceDiagram
    actor User
    participant audit as apm audit --ci
    participant ci as run_baseline_checks()
    participant fs as Filesystem

    User->>audit: apm audit --ci
    audit->>ci: run_baseline_checks(ci_mode=True)
    ci->>fs: apm_yml_path.exists()
    fs-->>ci: False
    ci->>fs: .apm/.exists(), apm.lock.yaml.exists(), apm.lock.exists()
    fs-->>ci: True (artifact found)
    ci-->>audit: CIAuditResult[manifest-missing passed=False]
    audit-->>User: exit non-zero + manifest-missing warning
Loading

Recommendation

Ship this PR after resolving the interactive-path promise mismatch: the PR body says 'warn (but don't fail) when apm.yml is absent' but the interactive apm audit code path does not call run_baseline_checks, so the warning is never emitted outside --ci. The maintainer must either wire the check into the interactive path or explicitly scope the feature to --ci only and update the PR description and --help text accordingly. That is a one-paragraph code or documentation change and should not block the merge long. All other findings are clean follow-ups: the integration exit-code test, the remediation hint in the warning message, the three stale doc locations, and the all-artifacts-wiped bypass scenario. The fix closes a real security gap and sets the foundation for APM's first visible security-hardening narrative -- it deserves to land cleanly.


Full per-persona findings

Python Architect

  • [recommended] ci_mode leaks a caller-context flag into a policy-layer function, coupling audit command intent to check semantics at src/apm_cli/policy/ci_checks.py:506
    run_baseline_checks lives in the policy domain layer. Accepting ci_mode=True from the command layer to flip passed=False vs passed=True means the policy function knows about its caller's execution context. The existing advisory precedent (_check_includes_consent) always emits passed=True and lets the caller decide how to surface it. A cleaner shape: manifest-missing always emits passed=True, and _audit_ci_gate promotes any check named 'manifest-missing' to a hard failure after the fact.
    Suggested: Remove ci_mode from run_baseline_checks. Always emit passed=True for manifest-missing. In _audit_ci_gate, after calling run_baseline_checks, check if any check.name == 'manifest-missing' and treat it as a hard failure there.

  • [nit] LOCKFILE_NAME/LEGACY_LOCKFILE_NAME promoted to module-level import while LockFile/get_lockfile_path stay lazy at src/apm_cli/policy/ci_checks.py:19
    Inconsistent import strategy within the same module and from-path. If deps.lockfile ever has an import-time side-effect or a circular-import risk, this inconsistency will surface as a hard-to-debug failure.
    Suggested: Either move LOCKFILE_NAME/LEGACY_LOCKFILE_NAME into the lazy import block inside run_baseline_checks, or promote all four to module-level and document why the lazy block exists.

CLI Logging Expert

  • [recommended] Message string may contain non-ASCII character, violating ASCII-only output rule at src/apm_cli/policy/ci_checks.py:563
    APM enforces ASCII-safe output. Terminals without UTF-8 may misrender non-ASCII characters.
    Suggested: Confirm the final rendered byte sequence is pure ASCII; replace any Unicode dash with a double ASCII hyphen ' -- '.

  • [recommended] Warning message omits actionable remediation hint (no how-to-fix guidance) at src/apm_cli/policy/ci_checks.py:563
    APM message writing convention: include the fix after every skip warning. The manifest-missing message tells the user what is wrong but not what to do.
    Suggested: Append: ' -- restore apm.yml or run apm init to reinitialise' to the message string.

  • [nit] Message leads with file name rather than outcome, inverting newspaper-test order at src/apm_cli/policy/ci_checks.py:558
    Lead with the outcome. 'apm.yml is missing' buries the security implication.
    Suggested: Consider: 'Missing manifest: apm.yml not found but APM artifacts (.apm/, apm.lock.yaml) exist - possible deleted manifest'

  • [nit] models.py remediation hint value 'apm.yml' is too terse to be actionable at src/apm_cli/policy/models.py
    Remediation hints should complete 'To fix this, ...'. The value 'apm.yml' alone does not tell the user what action to take.
    Suggested: Change value to: 'restore or recreate apm.yml (run apm init to reinitialise)'

DevX UX Expert

  • [recommended] Warning message gives no actionable next step for the user at src/apm_cli/policy/ci_checks.py
    Every comparable check in ci_checks.py ends with a concrete command. The manifest-missing message ends with '-- this may indicate a deleted manifest', which is a diagnosis, not a prescription. A developer hitting this in CI has no idea whether to run 'apm init', restore from git, or ignore it.
    Suggested: Append a next-action clause: '-- restore apm.yml from version control or run "apm init" to recreate it'

  • [recommended] Interactive apm audit (no --ci) never emits the manifest-missing warning -- PR body promise is undelivered at src/apm_cli/commands/audit.py
    The PR body says 'Warn (but don't fail) when apm.yml is absent'. But run_baseline_checks is only called with ci_mode=True in _audit_ci_gate. The interactive audit path never triggers the manifest-missing check, so a developer running 'apm audit' locally after deleting apm.yml gets zero signal.
    Suggested: Either call run_baseline_checks (without ci_mode=True) for the interactive path so it emits a warning, or document explicitly in --help that manifest-missing is only checked under --ci.

  • [nit] 'this may indicate' is hedging language -- be direct at src/apm_cli/policy/ci_checks.py
    The condition is unambiguous: lockfile or .apm/ exists but apm.yml does not. Hedging trains users to dismiss the warning.
    Suggested: Replace 'this may indicate a deleted manifest' with 'apm.yml appears to have been deleted or was never committed'

  • [nit] Remediation hint maps to a filename, not an action -- circular for a missing-file check at src/apm_cli/policy/models.py
    For manifest-missing the file IS the problem -- it is absent -- so pointing at 'apm.yml' as the remediation target is circular.
    Suggested: Change to an action string such as 'restore apm.yml from version control or run apm init'

Supply Chain Security Expert

  • [recommended] Attacker who deletes apm.yml AND all lock artifacts bypasses the new check entirely at src/apm_cli/policy/ci_checks.py:558
    The heuristic fires only when at least one of {.apm/, apm.lock.yaml, apm.lock} exists alongside a missing apm.yml. An attacker with write access who wipes all four artifacts gets a clean pass -- same bypass as before the fix, just more work.
    Suggested: Document the residual gap. Consider checking for a sentinel file (e.g. .apm/.initialized) written at first install. At minimum, the warning should note 'If all APM artifacts were removed, this check cannot detect the deletion.'

  • [recommended] ci_mode defaults to False; any future caller of run_baseline_checks outside _audit_ci_gate gets advisory-only behavior at src/apm_cli/policy/ci_checks.py:506
    run_baseline_checks is a public function. If a future integration calls it without ci_mode=True, the manifest-missing check silently passes. The security property is opt-in rather than opt-out, inverting the fail-closed principle for new callers.
    Suggested: Consider renaming to advisory_mode: bool = False and inverting the logic so callers must opt OUT of hard failure, or add a docstring note that CI callers MUST pass ci_mode=True for enforcement semantics.

  • [nit] Empty .apm/ directory triggers manifest-missing warning -- possible false positive at src/apm_cli/policy/ci_checks.py:555
    apm_dir.is_dir() is True for any directory named .apm, even an empty one created by an unrelated tool.
    Suggested: Check any(apm_dir.iterdir()) instead of just apm_dir.is_dir() to require at least one file inside .apm/ before concluding APM was previously used.

OSS Growth Hacker

  • [recommended] CHANGELOG entry leads with implementation detail, not the user-facing safety guarantee
    OSS evaluators scanning CHANGELOG to decide whether to trust a tool in CI will skim the first 8 words. The compelling claim is 'deleting apm.yml can no longer silently disable all policy checks'.
    Suggested: Reorder: 'Prevent silent audit bypass: deleting apm.yml in a repo with APM artifacts now triggers a hard failure in --ci mode (manifest-missing) and a warning in interactive mode, closing a policy-bypass gap. (Policy: missing apm.yml silently skips all policy and baseline checks #1056)'

  • [nit] This fix is worth a short 'security hardening' call-out in the next release post
    Supply-chain and policy-bypass stories resonate strongly with the security-conscious evaluators who influence enterprise OSS adoption decisions.
    Suggested: Add a 'Security' subsection in CHANGELOG (separate from 'Fixed') for findings like this -- mirrors the pattern used by pip, cargo, and npm.

Auth Expert -- inactive

No auth-surface files changed; PR only modifies CHANGELOG.md, src/apm_cli/commands/audit.py, src/apm_cli/policy/ci_checks.py, src/apm_cli/policy/models.py, and tests/unit/policy/test_ci_checks.py, none of which touch token management, credential resolution, host classification, or remote-host auth flows.

Doc Writer

  • [recommended] CLI reference hard-codes '7 baseline checks' -- now stale after this PR adds manifest-missing at docs/src/content/docs/reference/cli-commands.md
    Line 543 enumerates exactly 7 named checks. This PR adds an 8th. Readers relying on that count for scripting or audits will get a wrong number.
    Suggested: Change '7 baseline checks' to '8 baseline checks' and append 'manifest-missing (warn interactive / fail CI when apm.yml is absent but APM artifacts exist)' to the inline list.

  • [recommended] ci-cd.md also says '7 baseline checks' -- same stale count at docs/src/content/docs/integrations/ci-cd.md
    Line 151 repeats the same count verbatim. It is the entry point for CI setup readers.
    Suggested: Update '7 baseline checks' to '8 baseline checks' on line 151.

  • [recommended] policy-reference.md check table omits manifest-missing at docs/src/content/docs/enterprise/policy-reference.md
    Line 836 lists manifest-parse as a local audit check. manifest-missing is also a local baseline check with a defined CI/interactive split. Omitting it breaks the single-source-of-truth contract.
    Suggested: Add a row adjacent to manifest-parse: | manifest-missing | 1 in CI, 0 interactive | error (CI) / warn (interactive) | apm.yml not found but APM artifacts detected | Restore apm.yml or remove the APM artifacts. |

  • [nit] CHANGELOG entry is accurate and complete; no change needed at CHANGELOG.md
    Entry correctly names artifacts, describes dual-mode behavior, and links the issue.

Test Coverage Expert

  • [recommended] apm audit --ci manifest-missing exit-code contract has no integration-with-fixtures test at tests/integration/test_audit_silent_skip_e2e.py
    The user-visible promise -- apm audit --ci exits non-zero when apm.yml is absent but APM artifacts exist -- is a CLI command surface. The floor tier for CLI exit-code contracts is integration-with-fixtures. If the ci_mode=True wiring in audit.py drifts (refactor drops the kwarg), no test catches it. Grep of tests/integration/ for 'manifest_missing' returned zero hits.
    Suggested: Add a CliRunner test: invoke audit ['--ci', '--no-drift'] with .apm/ but no apm.yml, assert exit_code==1 and 'manifest-missing' in output. Mirror for interactive mode (no --ci): assert exit_code==0 and warning present.
    Proof (missing at integration-with-fixtures): tests/integration/test_audit_silent_skip_e2e.py::test_manifest_missing_ci_exits_nonzero -- proves: apm audit --ci exits non-zero when apm.yml is absent but APM artifacts exist [secure-by-default,governed-by-policy]
    assert result.exit_code == 1 and 'manifest-missing' in result.output

  • [nit] Interactive (no --ci) warn-but-pass path not fully asserted at tests/unit/policy/test_ci_checks.py
    test_apm_dir_triggers_warning asserts check.passed is True but does not assert that the overall result still passes. The scenario 'user runs apm audit without --ci and sees a warning but is not blocked' is a distinct user promise.
    Suggested: In test_apm_dir_triggers_warning, add: assert result.passed to confirm the overall baseline still passes in interactive mode.

  • [nit] Combined-artifacts scenario (.apm/ AND apm.lock.yaml both present) not tested at tests/unit/policy/test_ci_checks.py
    Each of the 9 tests exercises exactly one artifact at a time. The detection logic may emit a duplicate check when both are present simultaneously.
    Suggested: Add a case where both .apm/ and apm.lock.yaml are set up; assert exactly one 'manifest-missing' check appears in result.checks.
    assert len([c for c in result.checks if c.name == 'manifest-missing']) == 1

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1255 · ● 2.4M ·

Sergio Sisternes and others added 3 commits May 11, 2026 09:16
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes #1056

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stants

- Make manifest-missing check passed=False in ci_mode, passed=True otherwise
- Import LOCKFILE_NAME / LEGACY_LOCKFILE_NAME from deps.lockfile
- Check legacy apm.lock alongside apm.lock.yaml
- Use .is_dir() for .apm/ directory check
- Move CHANGELOG entry from Added to Fixed
- Add CI-mode and legacy-lockfile test coverage

Closes #1056
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Policy: missing apm.yml silently skips all policy and baseline checks

3 participants