A self-contained extension for the Powernode platform that provides node lifecycle management, declarative module composition, on-node runtime, and a self-improving fleet autonomy layer.
This repository is mounted into the Powernode platform as a submodule at
extensions/system/. It can be operated independently — the platform consumes
it via the standard extension contract.
- Node + instance lifecycle with a polymorphic Task model + AASM state
machines covering
pending → provisioning → running → stopped → terminated - Declarative templates composed from versioned NodeModules with
rsync-glob filter rules (
mask,file_spec,package_spec,dependency_spec) - Cloud provider abstractions (AWS, GCP, Azure, OpenStack, local QEMU) with per-region/zone catalogs
- Visual Template Composer in the React UI — drag modules onto a canvas, preview conflicts + footprint live, save with one click
- Fleet Dashboard — live event feed, correlation chains, drift queue, honeypot canary alerts, attribution feedback for failure analysis
- Go agent (
powernode-agent) — single static binary, ~20MB, replaces legacy bash scripts. Multi-cloud identity discovery (AWS/GCP/Azure/DigitalOcean/ libvirt fw-cfg), mTLS enrollment, OCI module pull, fs-verity verification, composefs+overlayfs union mount, heartbeat, task lease, cert rotation - Multi-arch initramfs builder — produces six artifact families per arch (kernel+initramfs bundle, raw disk image, ISO, iPXE chainload, qcow2, bootc-compatible OCI) for both amd64 and arm64 in one CI run
- Two-stage CI pipeline (Containerfile builder + composefs composer) that preserves the legacy rsync-glob composition layer while modernizing multistrap → mmdebstrap and mksquashfs → mkcomposefs
- Cosign-signed OCI artifacts with Sigstore Fulcio (no long-lived signing keys; ephemeral OIDC-bound certs)
- Per-module trust policy (
cosign_identity_regexp+cosign_issuer_regexp) pinning each module to its expected publisher
- 12 fleet sensors detecting silent instances, module drift, cert expiry, promotion readiness, config drift, SLO violations, honeypot canary access, trading pressure (cross-extension stigmergic coordination), and SDWAN health (peer reachability, BGP session, VIP reachability, drift)
- 14 AI Skill executors:
- 4 read-shape (capacity_recommend, attribute_failure, runbook_generate, cve_runbook_generate) — bound to System Concierge for chat
- 8 fleet autonomy (drift_remediate, cve_response, sdwan_failover, sdwan_peer_remediate, sdwan_bgp_session_remediate, sdwan_vip_failover, module_compose, rolling_module_upgrade)
- 2 runtime (docker_provision, provision_cluster) — bound to Runtime Manager
- FleetAutonomyService — gates every autonomous action through intervention policy + approval chain (auto_approve / notify_and_proceed / require_approval / blocked); same UI as trading-overseer's approval queue
- Learning loop — every confirmed/rejected operator decision feeds back into compound learnings that boost or downweight similar candidates next time
- Cross-domain stigmergic coordination — bidirectional pressure exchange with trading and other extensions via the platform's signal bus
- Per-module consent budget — operators set a daily ceiling on autonomous decisions per module; exhausted budget forces require_approval regardless of policy
- System::FleetEvent persistent log (90-day routine retention, 365-day critical retention) with per-tick correlation IDs
- SystemFleetChannel ActionCable broadcast for live UI updates
- Compliance snapshot — audit-grade JSON document of every node, instance, module digest, certificate, CVE exposure, drift state, and autonomy decision
To operate this extension you need a running Powernode platform installation. See the parent platform repo for installation instructions.
This extension contributes:
- Rails models, services, controllers (62 models across
system::*+sdwan::*, ~136 service classes across 10 subdomains, ~77 controllers across operator API + on-node API + worker API) - React/TypeScript frontend (~175 TS/TSX files including 9 page components + ~50 reusable components + 4 custom hooks + 15+ API client services)
- Worker jobs (7):
system_task_reaper,system_fleet_reconcile,system_cve_feed,system_fleet_event_retention,system_cloud_sync,system_execute_task,system_gitops_sync - Database migrations (66)
- Sidekiq cron schedule entries
extensions/system/
├── server/ # Rails models, services, controllers, specs
├── frontend/ # React TypeScript surface
├── worker/ # Sidekiq job classes
├── agent/ # Go on-node agent (powernode-agent)
├── initramfs/ # Multi-arch boot artifact builder
├── templates/
│ └── module-repo/ # Canonical module-source layout
└── docs/
├── ARCHITECTURE.md # Subsystem reference
├── runbooks/ # Step-by-step operator guides
├── examples/ # End-to-end use-case walkthroughs
└── … # Domain-specific reference (see Documentation)
| Doc | What it covers |
|---|---|
docs/ARCHITECTURE.md |
8 subsystems, threat model, state machines, API surfaces |
docs/USE_CASE_MATRIX.md |
What works / doesn't / what to expect for 10 NodeInstance container scenarios — READ FIRST when designing a deployment |
docs/CONTAINER_RUNTIMES.md |
Phase 1 Docker + Phase 2 K3s lifecycle + operator troubleshooting |
docs/SKILL_EXECUTORS.md |
All 14 skill executors with descriptors and example I/O |
docs/FLEET_SENSORS.md |
All 12 fleet sensors + intervention policy reference table |
docs/DISK_IMAGE_CI.md |
Webhook + CI worker + OCI artifact pipeline |
docs/MCP_API_REFERENCE.md |
All system_* / system_sdwan_* / kubernetes_* / docker_* MCP tool actions |
| Runbook | Goal |
|---|---|
docs/runbooks/node-provisioning.md |
Full Node + NodeInstance lifecycle (create → enroll → drain → decommission) with per-AASM-state error recovery |
docs/runbooks/sdwan-network-setup.md |
SDWAN end-to-end: networks, peers, VIPs, firewall, route policies, BGP, federation |
docs/runbooks/module-authoring.md |
Author + register + sign + publish a new NodeModule |
docs/runbooks/cve-response.md |
Full CVE response workflow with SBOM-aware matching |
docs/runbooks/instance-pool-tuning.md |
Pool sizing, reaping, draining, troubleshooting |
docs/runbooks/multi-cluster-k3s.md |
Multi-cluster K3s with target_cluster_id + HA control plane |
docs/runbooks/disk-image-ci.md |
Disk image build + signing + publication operator workflow |
docs/runbooks/vault-credential-restoration.md |
DR runbook for credential restoration |
docs/examples/ — 10 end-to-end walkthroughs (single-node QEMU, K3s + SDWAN, multi-tenant container farm, rolling upgrades, CVE response, instance pools, custom module authoring, honeypot canaries, federation, GitOps). Six have companion runnable seeds under server/db/seeds/example_*.rb.
docs/SMOKE_TEST.md— integration test checklist (LocalQemuProvider)docs/credential-restoration.md— Vault transit credential designdocs/agent-peering.md— NodeInstance-as-Agent design (in sweep)docs/gitops.md— GitOps reconciler design (in sweep)docs/TASKS.md— milestone trackerinitramfs/README.md— multi-arch boot artifact builderagent/README.md— Go agent layout + 13 subcommandstemplates/module-repo/README.md— canonical module-source layout
# Inside the Powernode platform working tree, where this extension is mounted
# as a submodule at extensions/system/
cd extensions/system
# Backend specs
cd server && bundle exec rspec
# Frontend type-check
cd ../frontend && npx tsc --noEmit
# Go agent tests
cd ../agent && go test ./...When working on this extension, always commit inside extensions/system/
first, then update the submodule pointer in the parent platform repo. See
CONTRIBUTING.md for the full submodule workflow.
MIT — see LICENSE.
Active development. Spec coverage: 1,430 examples / 0 failures (as of 2026-05-04). The Golden Eclipse roadmap (Track A through Track F) is substantially complete on backend + autonomy axes; frontend operator surface covers M-FE-1 (Visual Composer) and M-FE-3 (Fleet Dashboard, with Boot Replay viewer in active sweep).
- Slice 3 — first-class
Sdwan::VirtualIpwith clusterapi_endpointVIP failover (bootstrap-node loss → automatic VIP failover to nextk3s-serverholder; kubectl + workerK3S_URLsurvive the transition) - Slice 7 — pre-warmed
System::InstancePoolwith atomic acquisition- reaper auto-replenishment; cuts ephemeral provisioning latency from 5–10 min cold-boot to <30 s claim
- Slice 9 (a–f) — static subnet routing, first-class VIPs, iBGP/FRR, comprehensive frontend, observability/autonomy, route policies (JSONB statements compiled to FRR route-map + prefix-list/as-path-list)
- Slice 10 — config-variety dockerd
daemon.jsonoverrides via dependant module hierarchy (per-node + per-instance customization without rebuilding the base module) - Phase 2 K3s — full container runtime stack: cluster provisioner,
agent reconciler state machine, module catalog seed, multi-cluster
metadata.target_cluster_idjoin validation - Phase 1 Docker — managed
Devops::DockerHostwith InternalCaService TLS provisioning + cascade-FK decommission
- M0 — Foundation contracts + legacy spec porting (BootstrapToken, NodeCertificate, ModuleArtifact, mTLS, AASM)
- M2 — Go agent v0 (~4,400 LOC across 9 packages including security)
- M3 — Multi-arch image builder (six artifact families × amd64/arm64)
- M4 — QEMU thin slice (LocalQemuProvider with Libvirt/Recorder/Disabled runner triplet, virtio-fw-cfg seed, 15-spec integration coverage)
- M5 — MCP CRUD surface (SystemFleetTool, ~25 actions, per-action permission gates)
- M6 — AI Skills catalog (8 executors)
- M7 — FleetAutonomyService (gate_action!, 8 sensors, DecisionEngine, approval chains)
- M8 — Compound learning extraction (LearningExtractor wired into tick loop, auto-evolve trigger after 3 matching learnings)
Tracking under docs/TASKS.md. Adds: per-account
encryption key restoration via Vault transit, SBOM-aware CVE matching
(M-D2-2), GitOps reconciliation (M-D2-3), NodeInstance-as-Agent peer
registration (F-3), Boot Replay viewer (M-FE-3 completion), Module Marketplace
skeleton (M-FE-2), AI Concierge chat (M-FE-4).
- Powernode platform — the parent platform that mounts this extension
- Cosign — module signing
- composefs — verified-mount lower layer
- oras — OCI artifact tooling