This page summarizes GitHub Actions for this repository: the main CI entry, supply-chain and workflow-quality gates, reusable language jobs, and specialized pipelines (bench, PF CI, Helm).
- Triggers:
pushandpull_requesttomain. - Protobuf: Buf lint on
api/. - Path filter: Pull requests that touch only
docs/**orfigs/**skip heavy jobs; pushes tomainalways run the full matrix (see workflowpaths-ignorerules). - Reusable workflows:
.github/workflows/reusable-ci-prepare.yml,reusable-ci-lean.yml,reusable-ci-rust.yml,reusable-ci-go-node.yml,reusable-ci-extended.yml. - Rust (reusable-ci-rust): Workspace
cargo build,cargo test,cargo clippy. Thesidecar-watchercrate usesautotests = false; CI runscargo test -p sidecar-watcher --libandcargo test -p sidecar-watcher --testsfor explicit[[test]]targets only. Seeruntime/sidecar-watcher/tests/README.mdfor quarantined integration sources. - Broader schedules:
.github/workflows/ci-weekly-full.yml(weekly full matrix + Buf),.github/workflows/ci-nightly-pytest.yml(nightly Python/integration sweep).
| Workflow | Role |
|---|---|
dependency-review.yml |
On PRs: blocks introduction of dependencies with high (or worse) known vulnerabilities; denies selected strong-copyleft licenses. Requires the repository dependency graph where applicable. |
cargo-deny.yml |
Runs cargo deny check against the workspace using root deny.toml (licenses, RustSec advisories, duplicate-crate warnings). |
actionlint.yml |
When .github/workflows/** changes: lints workflow YAML via Docker image rhysd/actionlint:1.7.12. |
sbom-diff.yaml |
SBOM generation/diff and Grype-style checks (pinned Syft/Grype installers). |
release-sbom.yml |
On release: published, attaches CycloneDX JSON to the GitHub Release. |
scorecards.yml |
OpenSSF Scorecard on a schedule and on pushes to main. |
Contributor-oriented pointers: AGENTS.md (local commands, what not to commit) and .github/WORKFLOWS.md (workflow inventory).
The sections below document the reusable PF CI workflow that runs TRUST-FIRE GA phases, builds runtime images, and validates results, plus related bench and Helm material.
- Purpose: Lightweight PR gate and smoke for the SWE-bench pipeline. No network, no model keys, no Docker.
- Triggers: Every pull request; push to
main/masterwhen paths underbench/swebench/,bench/fixtures/,tests/test_swebench_runner_smoke.py, or this workflow change; schedule (nightly);workflow_dispatch. - Job
bench-smoke: Installs pytest and runspytest tests/test_swebench_runner_smoke.py -q --tb=short. Uses mock engine and local fixtures; no HuggingFace or OpenHands. Runs on any OS (Windows/Linux/macOS). Full bench with OpenHands and harness evaluation require Linux or WSL. - Job
rust-tests: Optional (continue-on-error: true). Runscargo test -q --workspace --no-fail-fast. Uses the shared.github/actions/cache-cargocomposite for cargo cache. Criterion benchmarks are not run on PRs. - Job
nightly-with-model: Optional; runs only when theBENCH_SWEBENCH_NIGHTLY_TOKENsecret is set (schedule or workflow_dispatch). Runs the same smoke tests; can be extended for full bench with model/dataset.
- Purpose: Regression tests for experiments scripts and bench/swebench components using synthetic fixtures (no Docker, no OpenHands, no HuggingFace network).
- Triggers: Push and pull_request to
mainwhen paths underbench/swebench/**,experiments/**,tests/test_*.py, ortests/fixtures/**change. - Job
pytest: Matrixubuntu-latest,windows-latest. Installs pytest and runs the module list in.github/workflows/bench-swebench-unit.yaml(Step "Run bench/swebench and experiments unit tests"), including:test_experiments_compare_runs,test_validate_predictions,test_check_no_stub,test_validate_pf_run,test_loader_from_file,test_workspace_plan,test_replay_roundtrip(skipped on Windows),test_swebench_runner_smoke,test_openhands_engine,test_policy_loader,test_cost_report,test_proof_hook,test_check_wsl_env,test_fill_manifest_from_run,test_list_delta_cases,test_bucket_pf_failures,test_policy_guard_deny_allow,test_provider_env,test_openhands_provider_env,test_run_swebench_eval_cleanup,test_run_config(provider routing, compare strict gates, eval cleanup scoping,RunConfigdefaults). Additional tests intests/test_*.py(e.g.test_runner_core,test_summarize_stress_run, modular SWE-bench component tests) may be run locally; seedocs/guides/testing-guide.mdanddocs/internal/swebench-stabilization-regression-matrix.md.
- Purpose: Criterion performance baseline and regression check (aligned with
bench/README.md). Smoke job runs on PRs; save/compare run on push or schedule. - Triggers: Push to
main/masterwhenbench/,runtime/sidecar-watcher/, or Cargo files change; pull_request when same paths change; schedule (cron 0 2 * * *);workflow_dispatch. - Job
smoke-bench: On push or PR, runscargo criterion -p provability-fabric-bench -- --sample-size 5 --noplotto ensure benches compile and run; no regression gate. - Cargo cache: Jobs use the shared
.github/actions/cache-cargocomposite (withkey-suffix: "-criterion-deps"). Criterion baseline is cached separately intarget/criterion. - Job
save-baseline: On push to main, runscargo bench -p provability-fabric-bench -- --save-baseline mainand cachestarget/criterion. - Job
compare-baseline: On schedule, restores cached baseline and runscargo bench -p provability-fabric-bench -- --baseline main; fails if regression exceeds threshold.
Local baseline: Run make bench-save-baseline to save the Criterion baseline and write bench/BASELINE.md (date, git_sha, machine). See bench/README.md for the notice that numeric thresholds are targets until baselines are recorded.
Rust perf policy: Criterion save/compare run in nightly or workflow_dispatch; PRs run only the smoke-bench job. Baselines are stored as workflow cache (target/criterion). For local runs, use cargo bench -p provability-fabric-bench -- --baseline main to compare against the saved baseline; run cargo bench to regenerate HTML reports under target/criterion/ (cargo-criterion does not support baseline or HTML options).
- Purpose: Weekly run of
exp-step2-lite-stress-large-repos(heavy-repo slice). Not gated in CI; produces trend artifacts. - Triggers: Schedule (cron 0 3 * * 0, Sunday 03:00);
workflow_dispatch. - Job
stress-run: Runs onubuntu-latest, long timeout. Fills manifest, runs baseline and PF-guarded runs for the stress instance set, runs harness and compare, thenexperiments/scripts/summarize_stress_run.pyto write stress_summary.json (schema_version, pf_commit, agent_commit, dataset_id, dataset_version, harness_id; timeout_rate_*, wall_clock_s_median/p95, guard_overhead_s_median, empty_patch_reasons_topN, solve rates). Stress regression alerts step runsexperiments/scripts/check_stress_alerts.py; thresholds are read from experiments/config/stress_alerts.yaml (optional; script uses built-in defaults if missing). Fails the job when parity, timeout delta, empty_patch rate, or guard_overhead exceed thresholds. Uploads compare.json, stress_summary.json, compare.csv; uploads stress_summary.json as named artifact stress-summary.
- Purpose: Sanity check that the publish-bundle verifier and fixture are valid (no network, no Docker). Runs unit tests for publish_docs and the verifier against the fixture so changes to publish_docs or publish_bundle are regression-tested.
- Triggers: Push (main/master) and pull_request when
experiments/scripts/verify_publish_bundle.py,experiments/scripts/publish_docs.py,experiments/scripts/publish_bundle.py,experiments/scripts/export_publish_artifacts.py,experiments/scripts/update_run_ids_if_green.py,experiments/fixtures/verify_publish_bundle/**,experiments/schemas/compare_report.schema.json,tests/test_publish_docs.py,tests/test_verify_publish_bundle_fixture.py, or this workflow change;workflow_dispatch. - Job
verify: (1) Install pytest; (2) runpytest tests/test_publish_docs.py tests/test_verify_publish_bundle_fixture.py -v; (3) runpython experiments/scripts/verify_publish_bundle.py --publish-dir experiments/fixtures/verify_publish_bundle/publish --compare-json experiments/fixtures/verify_publish_bundle/compare.json --skip-run-dir-check. - After pushing: Confirm the "Verify publish bundle" workflow is green in the Actions tab when any of the triggered paths change.
- Local reusable workflow:
.github/workflows/pf-ci.yaml
- Installs toolchains (Python 3.11, Go 1.23, Rust stable)
- Installs Kind, kubectl, and Helm
- Builds and loads images for:
runtime/sidecar-watcherruntime/admission-controllerruntime/ledgerruntime/attestor- Note:
runtime/wasm-sandboxand the Rust adapters (adapters/http-get,adapters/file-read) are not built as Docker images in this workflow; add them to the workflow if needed for deployment.
- Runs TRUST-FIRE GA subset (phases 2, 3, and 6 by default)
- Fails the job unless
trust-fire-report.jsoncontainsoverall_status: PASS
pr_number(string, optional): PR number (for callers that need it)run_phases(string, optional, default"2,3,6"): Comma-separated phases to runkind_cluster_name(string, optional, default"pf-ci"): Kind cluster name
GITHUB_TOKEN(optional): Used by some TRUST-FIRE stepsCI_PAT(optional): PAT for cross-repo checkout, if needed
name: PF Reusable CI Caller
on:
pull_request:
schedule:
- cron: "0 3 * * *"
jobs:
call-pf-ci:
uses: ./.github/workflows/pf-ci.yaml
with:
pr_number: ${{ github.event.pull_request.number || '' }}
run_phases: "2,3,6"
secrets:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}Publish the reusable workflow to a central repository (e.g., org/ci-workflows) under .github/workflows/pf-ci.yaml and tag a version (e.g., v1). Then call it:
name: Cross-Repo PF CI Consumer
on:
pull_request:
schedule:
- cron: "0 3 * * *"
jobs:
pf-ci:
uses: org/ci-workflows/.github/workflows/pf-ci.yaml@v1
with:
pr_number: ${{ github.event.pull_request.number || '' }}
run_phases: "2,3,6"
secrets:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}Path: charts/pf-enforce
- MutatingWebhookConfiguration: Injects
sidecar-watcherinto pods labeledpf/enforce=true - ValidatingWebhookConfiguration: Blocks pods lacking a valid PAB signature
- Admission controller Deployment + Service (HTTPS 8443)
- RBAC & ServiceAccount
- Sidecar ConfigMap (default sidecar JSON)
image:
sidecarWatcher:
repository: provability-fabric/sidecar-watcher
tag: latest
pullPolicy: IfNotPresent
admissionController:
repository: provability-fabric/admission-controller
tag: latest
pullPolicy: IfNotPresent
webhooks:
namespaceSelector: {}
objectSelector:
matchExpressions:
- key: pf/enforce
operator: In
values: ["true"]
timeouts:
mutating: 10
validating: 10
failurePolicy: Fail
caBundle: ""
admission:
replicaCount: 1
service:
type: ClusterIP
port: 443
tls:
secretName: pf-enforce-certshelm install pf-enforce charts/pf-enforce \
--set caBundle=$(kubectl get cm -n kube-system extension-apiserver-authentication -o jsonpath='{.data.client-ca-file}' | base64 -w0)Adjust caBundle to match your cluster’s CA.
Targets added:
pf-sign: cosign keyless signbundles/<SERVICE_NAME>/bundle.jsonand recordsigstore_digest
make pf-sign SERVICE_NAME=my-servicepf-verify: verify signature and runpf verify
make pf-verify SERVICE_NAME=my-servicePath: templates/pf-bootstrap/pf.yaml
Example:
service: <service-name>
attestor:
endpoint: http://attestor-service:8080
ledger:
endpoint: http://ledger-service:4000
redis:
url: redis://redis-master:6379
tools:
http_fetch:
allowlist:
- https://api.example.com
- https://*.trusted.comUse as a starting point for new service repos.