Skip to content

feat(policy): Phase 1 — org-level policy engine for apm audit --ci#365

Draft
danielmeppiel wants to merge 5 commits intomainfrom
feature/apm-policies-phase1
Draft

feat(policy): Phase 1 — org-level policy engine for apm audit --ci#365
danielmeppiel wants to merge 5 commits intomainfrom
feature/apm-policies-phase1

Conversation

@danielmeppiel
Copy link
Collaborator

Summary

Introduces the APM policy engine (Phase 1) — org-level governance for agent packages via apm-policy.yml files discovered from .github repos.

What's new

Policy module (src/apm_cli/policy/)

  • Schema — 10 frozen dataclasses modeling apm-policy.yml (dependencies allow/deny/require, MCP server policies, compilation settings, unmanaged file controls)
  • Parser — YAML loading with validation, auto-coercion of YAML 1.1 boolean quirks (off → string)
  • Matcher — Glob pattern matching (* single-segment, ** recursive) with @lru_cache for perf
  • Discovery — Auto-discovers org policy from git remote → {org}/.github/apm-policy.yml via GitHub Contents API; supports explicit --policy <file> override; cache with configurable TTL
  • Inheritance — Three-level chain (extends: hub → org → repo) with tighten-only merge semantics (deny=union, allow=intersection, require=union, enforcement escalates)
  • CI checks — 6 baseline checks (lockfile, drift, integrity, version pinning, resolution, required fields) + 16 policy-enhanced checks (allow/deny lists, MCP server validation, compilation enforcement, unmanaged file detection)

CLI changes (apm audit)

  • --ci flag — runs baseline CI checks, outputs structured JSON/SARIF
  • --ci --policy <file|url> — adds policy-enhanced checks
  • --no-cache — bypasses policy cache for tighter CI environments

Test fixtures (live on GitHub)

Test coverage

Suite Count Status
Unit tests 2,334
Integration tests 127
Policy unit tests 298 ✅ (included in unit total)
Live API E2E tests 6 ✅ (against DevExpGbb org)
Release validation Hero scenarios 1 & 2
Integration script 10 suites

Documentation

  • docs/.../enterprise/policy-reference.md — full schema reference
  • docs/.../guides/ci-policy-setup.md — CI setup guide
  • templates/policy-ci-workflow.yml — GitHub Actions workflow template
  • Updated governance, CI/CD, rulesets, CLI reference, lockfile spec docs

Design doc

See WIP/apm-policies/apm-policies.md for the full design including Phase 2 (runtime hooks, IDE feedback) and the strategic amendments addressing side-loading, fail-open/fail-closed, cache staleness, and conflict resolution.


Closes the Phase 1 scope from the APM Policies design. Phase 2 (runtime hooks, pre-commit integration) is planned separately.

danielmeppiel and others added 3 commits March 19, 2026 00:00
…gs (P1.4)

Add 16 policy checks layered on top of baseline CI gate:
- Dependency allow/deny lists, required packages, version pins, depth limits
- MCP allow/deny, transport restrictions, self-defined server policy
- Compilation target/strategy enforcement, source attribution
- Required manifest fields, scripts policy, unmanaged files detection

Wire --policy and --no-cache flags into apm audit --ci command.
Update CLI reference docs with new options and examples.
Add 75 unit tests covering all checks and CLI integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s, docs

APM Policies Phase 1 implementation:

P1.1 - apm audit --ci baseline: 6 lockfile consistency checks (ref,
       deployed files, orphans, config, content integrity)
P1.2 - Policy schema & parser: ApmPolicy dataclass, YAML validation,
       apm-policy.yml full schema with schema_version support
P1.3 - Policy discovery: auto-discover from <org>/.github via GitHub API,
       cache with configurable TTL, --policy override, token resolution
P1.4 - apm audit --ci --policy: 16 policy checks (allow/deny, require,
       MCP, compilation, manifest, unmanaged files) [prev commit]
P1.5 - Policy inheritance chain: extends: supports org, <owner>/<repo>,
       URL. Tighten-only merge (deny=union, allow=intersection).
       Circular reference protection (max depth 5)
P1.6 - Docs: governance.md, policy-reference.md, ci-policy-setup.md,
       CI workflow template, updated 6 doc pages

Test coverage: 360 policy tests + 2334 total unit tests passing.
Integration test fixtures for DevExpGBB org.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add 6 live API tests in TestPolicyDiscoveryLiveAPI:
  - Auto-discover policy from cloned DevExpGbb repo
  - Fetch org policy from DevExpGbb/.github
  - Fetch repo-level override from apm-policy-test-fixture
  - Verify cache behavior (API then cache)
  - Merge org + repo policies live
  - Nonexistent policy returns not-found
- Fix org casing: DevExpGBB → DevExpGbb in fixtures and tests
- All 2334 unit tests, 127 integration tests, release validation pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 18, 2026 23:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces Phase 1 of an org-level policy engine that extends apm audit --ci with optional policy discovery (--policy org|<file>|<url>), plus a new baseline CI gate mode for lockfile consistency reporting (JSON/SARIF/text).

Changes:

  • Adds src/apm_cli/policy/ (schema, parser/validation, glob matcher, inheritance merge, org policy discovery+cache, baseline+policy CI checks).
  • Extends apm audit with --ci, --policy, and --no-cache, including JSON/SARIF reporting for CI.
  • Adds extensive unit/integration/E2E tests, fixtures, workflow template, and documentation updates for policy/CI setup.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
uv.lock Bumps local editable apm-cli version.
tests/unit/test_audit_policy_command.py CLI-level tests for apm audit --ci --policy flows and --no-cache.
tests/unit/test_audit_ci_command.py CLI-level tests for baseline apm audit --ci behavior, outputs, and exit codes.
tests/unit/policy/test_schema.py Validates default/frozen behavior of policy schema dataclasses.
tests/unit/policy/test_policy_checks.py Unit tests for individual policy checks and run_policy_checks.
tests/unit/policy/test_parser.py Tests YAML parsing + validation behavior for policy files.
tests/unit/policy/test_matcher.py Tests glob matching semantics and allow/deny evaluation.
tests/unit/policy/test_inheritance.py Tests tighten-only inheritance merge semantics + chain validation.
tests/unit/policy/test_fixtures.py Smoke tests that fixture policies parse and contain expected fields.
tests/unit/policy/test_discovery.py Tests policy discovery from file/url/repo/org plus caching and GitHub API fetch handling.
tests/unit/policy/test_ci_checks.py Tests baseline CI checks engine + serialization (JSON/SARIF).
tests/unit/policy/init.py Declares the unit-test policy package namespace.
tests/integration/test_policy_discovery_e2e.py Optional gated E2E tests for discovery + (optionally) live GitHub API calls.
tests/fixtures/policy/repo-override-policy.yml Repo-level override fixture (extends: org).
tests/fixtures/policy/org-policy.yml Org policy fixture used across unit/integration/E2E tests.
tests/fixtures/policy/minimal-policy.yml Minimal policy fixture to validate defaults.
tests/fixtures/policy/enterprise-hub-policy.yml Enterprise hub fixture to validate stricter inheritance behavior.
templates/policy-ci-workflow.yml GitHub Actions workflow template for baseline + policy CI checks with SARIF upload.
src/apm_cli/policy/schema.py Frozen dataclasses representing apm-policy.yml schema.
src/apm_cli/policy/parser.py YAML loading + validation/coercion into schema objects.
src/apm_cli/policy/matcher.py Cached glob-pattern matcher for allow/deny policies.
src/apm_cli/policy/inheritance.py Tighten-only merge + chain validation helpers.
src/apm_cli/policy/discovery.py Org policy discovery from git remote + GitHub Contents API + TTL cache.
src/apm_cli/policy/ci_checks.py Baseline and policy CI checks + JSON/SARIF serialization.
src/apm_cli/policy/init.py Public policy module exports.
src/apm_cli/commands/audit.py Adds --ci, --policy, --no-cache, and CI result rendering/output.
docs/src/content/docs/reference/lockfile-spec.md Updates CI guidance to reflect apm audit --ci and --policy org.
docs/src/content/docs/reference/cli-commands.md Documents new CLI flags and CI-mode exit codes/examples.
docs/src/content/docs/integrations/github-rulesets.md Updates rulesets guidance to include apm audit --ci + policy enforcement.
docs/src/content/docs/integrations/ci-cd.md Adds lockfile consistency and policy enforcement section for CI/CD.
docs/src/content/docs/guides/ci-policy-setup.md New guide: how to set up org policy + CI enforcement.
docs/src/content/docs/enterprise/policy-reference.md New policy reference doc (schema, checks, inheritance, patterns).
docs/src/content/docs/enterprise/making-the-case.md Updates enterprise narrative to reflect CI policy enforcement availability.
docs/src/content/docs/enterprise/governance.md Adds org policy governance and apm audit --ci baseline details.

Comment on lines +946 to +962
# Build set of deployed files from lockfile
deployed: set = set()
if lock:
for _key, dep in lock.dependencies.items():
for f in dep.deployed_files:
deployed.add(f.rstrip("/"))

unmanaged: List[str] = []
for gov_dir in dirs:
dir_path = project_root / gov_dir
if not dir_path.exists() or not dir_path.is_dir():
continue
for file_path in dir_path.rglob("*"):
if file_path.is_file():
rel = file_path.relative_to(project_root).as_posix()
if rel not in deployed:
unmanaged.append(rel)
Comment on lines +72 to +78
@dataclass(frozen=True)
class ManifestPolicy:
"""Rules governing apm-manifest.yml content."""

required_fields: List[str] = field(default_factory=list)
scripts: str = "allow" # allow | deny
content_types: Optional[Dict] = None # {"allow": [...]}
Comment on lines +13 to +17
from apm_cli.commands.audit import audit
from apm_cli.models.apm_package import clear_apm_yml_cache
from apm_cli.policy.discovery import PolicyFetchResult
from apm_cli.policy.schema import ApmPolicy, DependencyPolicy

Comment on lines +589 to +613
# Resolve effective format
effective_format = output_format
if output_path and effective_format == "text":
from ..security.audit_report import detect_format_from_extension

effective_format = detect_format_from_extension(Path(output_path))

if effective_format in ("json", "sarif"):
import json as _json

payload = (
ci_result.to_sarif()
if effective_format == "sarif"
else ci_result.to_json()
)
output = _json.dumps(payload, indent=2)
if output_path:
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
Path(output_path).write_text(output, encoding="utf-8")
_rich_success(f"CI audit report written to {output_path}")
else:
click.echo(output)
else:
_render_ci_results(ci_result)

Comment on lines +106 to +115
org_and_host = _extract_org_from_git_remote(project_root)
if org_and_host is None:
return PolicyFetchResult(error="Could not determine org from git remote")

org, host = org_and_host
repo_ref = f"{org}/.github"
if host and host != "github.com":
repo_ref = f"{host}/{repo_ref}"

return _fetch_from_repo(repo_ref, project_root, no_cache=no_cache)
Comment on lines +374 to +379
policy = load_policy(policy_file)
return PolicyFetchResult(
policy=policy,
source=f"org:{repo_ref}",
cached=True,
)
Comment on lines +569 to +588
# Optionally run policy checks
if policy_source:
from ..policy.discovery import discover_policy

fetch_result = discover_policy(
project_root,
policy_override=policy_source,
no_cache=no_cache,
)

if fetch_result.error:
_rich_error(f"Policy fetch failed: {fetch_result.error}")
sys.exit(1)

if fetch_result.found:
policy_result = run_policy_checks(
project_root, fetch_result.policy
)
ci_result.checks.extend(policy_result.checks)

Comment on lines +69 to +73
# Unknown top-level keys (warn, don't fail)
unknown = set(data.keys()) - _KNOWN_TOP_LEVEL_KEYS
for key in sorted(unknown):
logger.warning("Unknown top-level policy key: %s", key)

missing: List[str] = []
for req in policy.require:
pkg_name = req.split("#")[0]
found = any(ref.startswith(pkg_name) for ref in dep_refs)
@danielmeppiel danielmeppiel marked this pull request as draft March 19, 2026 00:00
@danielmeppiel danielmeppiel marked this pull request as draft March 19, 2026 00:00
@sergio-sisternes-epam
Copy link
Collaborator

sergio-sisternes-epam commented Mar 19, 2026

Performance & Architecture Review — PR #365

Scale model for analysis

A large-org monorepo:

  • D = 200 direct + transitive APM dependencies
  • F = 1,500 deployed files (instructions, prompts, skills, rules across ~200 packages)
  • P = 40 patterns in policy allow/deny lists
  • R = 15 required packages
  • G = 500 files in governance directories (.github/agents/, .cursor/rules/, etc.)

P1 — _check_content_integrity does full-disk I/O scan on every --ci run

Complexity: O(F x C) where F = deployed files and C = characters per file

def _check_content_integrity(project_root, lock):
    from ..commands.audit import _scan_lockfile_packages
    findings_by_file, _files_scanned = _scan_lockfile_packages(project_root)

This calls _scan_lockfile_packages -> iterates all lockfile deps -> reads every deployed file from disk -> runs the character-level Unicode scanner on each. The scanner itself iterates every character in every file (ContentScanner.scan_text loops over lines x chars).

At monorepo scale: 1,500 file reads + 1,500 full-text scans, every CI run. The scanner's isascii() fast-path helps for pure-ASCII files (~1us skip), but prompt files with emoji, non-Latin characters, or markdown syntax will fall through to the O(C) character loop.

Additional concern — circular import: policy/ci_checks.py imports _scan_lockfile_packages from commands/audit.py. This makes the policy module depend on its own consumer — an inversion that also prevents the function from being independently benchmarked or optimized.


P2 — _check_unmanaged_files does recursive rglob("*") on governance directories

Complexity: O(G) where G = total files across all governance directories

for gov_dir in dirs:
    dir_path = project_root / gov_dir
    for file_path in dir_path.rglob("*"):
        if file_path.is_file():
            rel = file_path.relative_to(project_root).as_posix()
            if rel not in deployed:   # <- set lookup, O(1) — good
                unmanaged.append(rel)

The rglob("*") is an unconstrained filesystem walk with no depth limit or file-count cap.

At monorepo scale: 500+ stat() calls + 500+ relative_to() path computations per CI run across .github/agents/, .github/instructions/, .cursor/rules/, .claude/, .opencode/.


P3 — O(R x D x L) in required-packages checks

Three functions iterate required packages x manifest deps x lockfile deps:

_check_required_packages — O(R x D), _check_required_packages_deployed — O(R x D) + O(R x L), _check_required_package_version — O(R x L).

At monorepo scale: 15 required x 200 deps x 200 lockfile entries = 600,000 iterations across the three functions. The startswith comparison is also semantically fragile — org/package-v2 would match a requirement for org/package.


P4 — Pattern matching: fine for Phase 1, worth noting for future

The @lru_cache(maxsize=512) on _compile_pattern is correctly sized for patterns (~40 entries vs 512 capacity). The O(P x D) regex match cost is amortized well. No action needed for Phase 1 — note only.


P5 — _check_deployed_files_present does O(F_total) individual stat() calls

for _dep_key, dep in lock.dependencies.items():
    for rel_path in dep.deployed_files:
        abs_path = project_root / rel_path
        if not abs_path.exists():          # stat() syscall per file

At monorepo scale: 1,500 os.stat() calls. ~1-10us on local disk, but 1-10ms on network-mounted CI filesystems.


P6 — All 22 checks run unconditionally even when early failures make the result predetermined

The baseline runner short-circuits on lockfile-exists failure (good), but all 5 remaining baseline checks always run. The policy runner always executes all 16 checks regardless of earlier failures.

At monorepo scale: If ref-consistency fails, the expensive content-integrity check still runs (1,500 file reads) for a result that is already exit code 1.


P7 — No file-exclusion mechanism for any of the I/O-heavy checks

There is no .apmignore file, no exclude patterns in the policy schema, and no skip-list anywhere in the scanning pipeline.

Check What it scans Exclusion logic
content-integrity Every file in lock.deployed_files None
deployed-files-present Every file in lock.deployed_files Path traversal safety only
unmanaged-files rglob("*") on 6 governance dirs None

Large orgs with vendored configs, generated instruction files, or documentation alongside agent configs have no way to control scanning scope.


P8 — God files and god functions will hinder Phase 2 velocity

ci_checks.py (1,112 lines) — contains 3 distinct layers (result model ~110 lines, 6 baseline checks ~350 lines, 16 policy checks ~650 lines) with different dependencies and change velocities.

audit.py (570 -> ~740 lines) — now handles two completely independent modes (content scan + CI) sharing no logic. The audit() function grows to ~90 lines / 11 parameters. Phase 2 adds --drift as a third mode.

run_policy_checks (~50 statements) — sequential dispatch appending 16 results. Independently testable checks are colocated in a single module.


Complexity Budget Summary

Check Current complexity At scale (200D, 1500F, 40P, 15R, 500G) Bottleneck type
content-integrity O(F x C) 1,500 file reads + char scans Disk I/O
unmanaged-files O(G) 500 stat + path ops Disk I/O
deployed-files-present O(F_total) 1,500 stat calls Disk I/O
required-packages (3 checks) O(R x D x L) ~600K string ops CPU
Pattern matching (allow/deny) O(P x D) ~8K regex matches CPU (cached)
All others O(D) or O(1) Negligible

Three of the top four bottlenecks are disk I/O. The most impactful optimizations avoid unnecessary file I/O via fail-fast, exclusion patterns, incremental scanning, or batched stat operations.


Recommended Implementation Plan

We recommend addressing all items below before merging this PR to establish the right architectural foundation for the policy module. If timeline pressures require a phased approach, each phase maps cleanly to a standalone follow-up issue.

Phase A — Structural refactors (no logic changes, low risk)

These are mechanical moves — relocate code, update imports, verify tests still pass. They unblock everything else and set the pattern for the new policy/ module.

  1. Split ci_checks.py into 3 modules

    • policy/models.pyCheckResult, CIAuditResult with to_json(), to_sarif() (~110 lines)
    • policy/ci_checks.py — 6 baseline checks + run_baseline_checks (~350 lines)
    • policy/policy_checks.py — 16 policy checks + run_policy_checks + _load_raw_apm_yml (~650 lines)
  2. Extract _scan_lockfile_packages from commands/audit.py to security/file_scanner.py

    • Breaks the circular import (policy/ -> commands/)
    • Enables independent testing and benchmarking of the scanner
    • Update imports in both audit.py and ci_checks.py
  3. Extract CI mode from audit() into _run_ci_mode() helper

    • Keeps audit() as a thin dispatcher with clear mode branches
    • Prepares the command for Phase 2's --drift mode
    • Move _render_ci_results to a to_text() method on CIAuditResult (parallel to to_json()/to_sarif())

Phase B — Performance quick wins (small logic changes, high impact)

  1. Add fail-fast mode to CI checks

    • Default --ci stops after the first check failure (skips expensive I/O checks when result is already exit 1)
    • Add --no-fail-fast flag for users who want the full diagnostic report
    • Estimated latency reduction at monorepo scale: ~70% in common failure paths
  2. Pre-index dependencies in required-packages checks

    • Build dep_by_canonical dict and lock_by_key dict once before the check loops
    • Replace O(R x D) linear scans with O(1) dict lookups
    • Fixes the startswith prefix-collision bug (e.g., org/package-v2 matching org/package)
    • Reduces 600K iterations to ~215 dict lookups at scale
  3. Add max_files safety threshold to _check_unmanaged_files

    • Cap rglob("*") traversal at 10,000 files with a warning
    • Prevents runaway scanning in repos with large governance directories

Phase C — Exclusion mechanism (medium effort, strategic)

  1. Add exclude patterns to the unmanaged_files policy section

    unmanaged_files:
      action: warn
      directories:
        - .github/agents
      exclude:
        - .github/agents/generated/**
        - .cursor/rules/vendor/**
    • Reuse the existing matches_pattern() glob engine from policy/matcher.py
    • Gives orgs a self-service knob to control scanning cost
  2. Support .apmignore at project level (or audit.exclude in policy)

    • Applies to content-integrity and deployed-files-present checks
    • Analogous to .gitignore — familiar pattern for developers
    • Can share the same glob engine as the policy exclude patterns

Phase D — Advanced I/O optimizations (medium effort, monorepo-scale payoff)

  1. Incremental content scanning

    • Only scan deployed files whose mtime is newer than lockfile's generated_at timestamp
    • For PR-triggered CI, this typically limits the scan to 2-5 changed files instead of 1,500
    • Estimated latency reduction: ~90% for typical CI runs
  2. Batch stat() calls in _check_deployed_files_present

    • Walk unique parent directories once via os.scandir() to build a set of existing files
    • Replace 1,500 individual stat() calls with ~20 scandir() calls + O(1) set lookups
    • Estimated improvement: ~10x on network-mounted CI filesystems

Our recommendation: Phases A-B should ship with this PR — they are low-risk structural and algorithmic improvements that establish the right foundation. Phase C is the most strategic for enterprise adoption and should follow immediately. Phase D can be tracked as optimization issues and addressed when real-world monorepo telemetry confirms the bottlenecks.

If any phase needs to be deferred, each one is self-contained and can be extracted as a follow-up issue with the scope and acceptance criteria described above.

@sergio-sisternes-epam
Copy link
Collaborator

Gap: No policy control over which MCP registries are approved sources

The mcp.allow/mcp.deny lists match against server names only (via check_mcp_allowed() in matcher.py). The MCPDependency.registry field — which can be None (default), False (self-defined), or a custom URL string — is never inspected by any policy check.

This means a developer can point to an arbitrary registry URL in their apm.yml:

mcp:
  - name: my-server
    registry: "https://untrusted-registry.example.com"

...and it will pass all MCP policy checks as long as the server name matches an allowed pattern.

Suggested addition to McpPolicy schema:

mcp:
  registries:
    allow:
      - "https://registry.mcphub.io"
      - "https://internal.corp.net/mcp"
    deny:
      - "https://*.untrusted.example.com"
  allow: ["*"]          # existing server-name allow list
  self_defined: deny    # existing self-defined control

The check would inspect MCPDependency.registry for every MCP dep and validate it against the registry allow/deny list using the same _check_allow_deny() glob engine already in matcher.py.

This is especially relevant for enterprise supply-chain security — orgs need to ensure MCP servers are only resolved from vetted registries, not arbitrary endpoints.

Not necessarily a Phase 1 blocker, but worth tracking as a follow-up since the schema is being established now and adding registries later would be non-breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants