devops-ai-workflows

A growing collection of AI-agent workflows, prompts, and rules for day-to-day DevOps / SRE / platform work.

Note: "workflows" here means AI coding-agent workflows (Windsurf, Cursor, Claude Code, etc.) — not GitHub Actions.

What's inside

Folder	Purpose	Audience
`workflows/`	Workflow definitions, grouped by domain	Everyone
`prompts/`	Reusable system / task prompts (incident triage, code review, post-mortem, etc.)	Any LLM
`rules/`	Editor / agent rule files (`.windsurfrules`, `.cursorrules`, Copilot instructions)	Per-tool
`scripts/`	Standalone shell scripts referenced by workflows	Anyone with a shell

Available workflows

Kubernetes

Workflow	Slash command	Description	Prerequisites
k8s-debug	`/k8s-debug`	General-purpose, read-only cluster diagnostics across nodes, pods, workloads, networking, storage, RBAC, events, and resource pressure.	`kubectl`. Optional: `jq`, metrics-server.
k8s-workload-debug	`/k8s-workload-debug`	Deep-dive on a single Deployment / StatefulSet / DaemonSet / Job / Pod: rollout, spec, probes, resources, logs, networking, storage, config.	`kubectl`. Optional: `jq`, metrics-server.
k8s-rbac-audit	`/k8s-rbac-audit`	RBAC risk audit — wildcards, cluster-admin bindings, risky verb/resource combos, over-privileged ServiceAccounts, anonymous access.	`kubectl`, `jq`. Optional: `kubectl-who-can`.
k8s-cost-hotspots	`/k8s-cost-hotspots`	Find waste: over-provisioned workloads, missing requests/limits, idle workloads, orphan PVCs/PVs, idle LoadBalancers.	`kubectl`, `jq`, metrics-server.
k8s-upgrade-readiness	`/k8s-upgrade-readiness`	Pre-flight before a control-plane / node upgrade: deprecated APIs, version skew, PDB gaps, expiring certs, broken webhooks.	`kubectl`. Optional: `kubent` or `pluto`, `helm`.
helm-release-debug	`/helm-release-debug`	Diagnose a stuck or failed Helm release: history, values diff, hook failures, rendered manifest vs cluster, workload health.	`helm` v3, `kubectl`. Optional: `jq`, `yq`.
helm-chart-review	`/helm-chart-review`	Review a Helm chart for security, reliability, and best practices: resource specs, probes, security context, PDBs, anti-affinity, RBAC.	Helm chart source. Optional: `helm` CLI.

AWS / Cloud

Workflow	Slash command	Description	Prerequisites
aws-account-audit	`/aws-account-audit`	Read-only AWS account security & hygiene audit: IAM, S3, EC2, RDS, CloudTrail, encryption, GuardDuty, SecurityHub.	`aws` CLI. Optional: `jq`.
aws-cost-quickscan	`/aws-cost-quickscan`	Find AWS cost waste: idle EC2/RDS, unattached EBS, old snapshots, expensive log groups, NAT data processing, missing Savings Plans.	`aws` CLI, Cost Explorer enabled. Optional: `jq`.
aws-vpc-debug	`/aws-vpc-debug`	Diagnose VPC connectivity: trace path across SGs, NACLs, route tables, NAT/IGW/TGW, VPC endpoints, DNS, and flow logs.	`aws` CLI. Optional: `jq`, `dig`.
aws-iam-policy-review	`/aws-iam-policy-review`	Explain an IAM policy and flag risks: admin-equivalent access, privilege escalation paths, wildcard actions, missing conditions.	`aws` CLI. Optional: `jq`.

IaC

Workflow	Slash command	Description	Prerequisites
terraform-plan-review	`/terraform-plan-review`	Explain a Terraform plan and flag risky changes: destroys, replacements, security group mutations, IAM changes, blast radius.	`terraform plan` output. Optional: `terraform` CLI, `jq`.

Containers & CI/CD

Workflow	Slash command	Description	Prerequisites
ci-debug	`/ci-debug`	Diagnose a failing CI/CD pipeline: parse build logs from Jenkins, GitHub Actions, GitLab CI, or Bitbucket Pipelines. Root cause analysis and fix suggestions.	Build log output. Optional: repo source, CI config file.
jenkins-pipeline-review	`/jenkins-pipeline-review`	Review Jenkinsfile / shared-library Groovy for security risks, anti-patterns, missing error handling, credential leaks, CPS issues, and build config cross-references.	Jenkinsfile(s) or `vars/*.groovy`. Optional: `repositories_v2.json`.
release-checklist	`/release-checklist`	Pre-release safety gate: scope, deploy order, rollback, tests, monitoring, and communication before production release.	PR/diff summary. Optional: test results, plans, diffs.
dockerfile-review	`/dockerfile-review`	Review Dockerfiles for security, size, caching, and best practices. Flags CVE-prone bases, leaked secrets, missing health checks.	Dockerfile(s). Optional: `docker`, `trivy`.

Security

Workflow	Slash command	Description	Prerequisites
secrets-leak-scan	`/secrets-leak-scan`	Scan git repo history for leaked secrets: API keys, passwords, tokens, private keys. Uses gitleaks, trufflehog, or regex fallback.	Git repo. Optional: `gitleaks`, `trufflehog`.
repo-health	`/repo-health`	Audit repository hygiene: README, license, CI, branch/release hygiene, tracked secrets, ownership, and automation gaps.	Local git repo. Optional: `gh`, `jq`.

Observability & Incident

Workflow	Slash command	Description	Prerequisites
incident-triage	`/incident-triage`	Guided first 15 minutes of a production incident: timeline, blast radius, evidence gathering, mitigation suggestions.	Access to affected environment.

Prompts

Reusable system prompts you can paste into any AI agent for common DevOps tasks:

Prompt	What it does
incident-commander	Puts the AI in incident-commander mode: timeline, blast radius, action tracking, status updates.
postmortem-writer	Generates a blameless post-mortem from incident notes: timeline, root cause, impact, action items.
code-review-devops	Reviews IaC / pipeline / Docker / K8s code with a security-first DevOps lens.
pr-description	Generates a PR description from a diff: what, why, how, testing, risk, rollback plan.
explain-like-a-senior	Explains infrastructure code to junior engineers: what it does, why, gotchas, and how it fits together.
runbook-from-incident	Converts incident notes or post-mortems into reusable runbooks with diagnosis, mitigation, escalation, and follow-up steps.

Rules

Persistent instruction files that shape AI behavior. Copy into a project's .windsurf/rules/ or use as .windsurfrules:

Rule file	What it does
devops-agent.windsurfrules	Safety guardrails for AI in DevOps repos: never modify prod without confirmation, prefer read-only, never hardcode secrets, always check context, GitOps awareness, multi-repo coordination.
terraform.windsurfrules	Terraform-specific: state safety, ForceNew attribute warnings, provider/module pinning, workspace safety, import workflow, `prevent_destroy` reminders.
kubernetes.windsurfrules	Kubernetes-specific: context verification, dry-run first, Helm safety, ArgoCD/GitOps awareness, secret handling, debugging approach, RBAC best practices.

Scripts

Standalone shell utilities referenced by workflows or useful on their own:

Script	Usage
k8s-snapshot.sh	`./k8s-snapshot.sh [namespace\|all] [output-dir]` — dump cluster state (nodes, pods, events, services, top) to a timestamped Markdown file.
aws-whoami.sh	`./aws-whoami.sh [profile]` — quick AWS identity check: caller, region, account alias, org, SSO role.
stale-branches.sh	`./stale-branches.sh [days] [--remote]` — list git branches older than N days with last commit info.
validate-repo.sh	`./scripts/validate-repo.sh` — validate workflow frontmatter, README links, script executability, and optional lint checks.

Using a workflow

In AI agents

Open the matching file in workflows/ and either:

invoke it as a slash command if your agent supports workflow discovery from this repo,
paste the relevant section into the agent's chat, or
include the file as context and ask the agent to follow it.

As a plain human workflow

Every workflow is just Markdown with shell commands. You can run the steps yourself in a terminal — no AI required.

Repo layout

devops-ai-workflows/
├── workflows/
│   ├── kubernetes/          # Kubernetes workflow definitions
│   ├── aws/                 # AWS / cloud workflow definitions
│   ├── iac/                 # Infrastructure as Code workflows
│   ├── cicd/                # CI/CD pipeline workflows
│   ├── containers/          # Container & image workflows
│   ├── security/            # Security & repo hygiene workflows
│   └── observability/       # Observability & incident workflows
├── prompts/                 # Reusable LLM prompts
├── rules/                   # Editor/agent rule files
├── scripts/                 # Standalone shell helpers
├── CONTRIBUTING.md
├── LICENSE
└── README.md

Roadmap

Ideas I plan to add (PRs welcome):

AWS / cloud

/aws-eks-debug — bridge EKS + Kubernetes: node groups, OIDC, add-ons, IAM roles for service accounts
/aws-rds-health — RDS/Aurora diagnostics: events, metrics, parameter groups, replication lag
/aws-lambda-debug — Lambda diagnostics: errors, throttles, DLQ, VPC/ENI, CloudWatch logs
/aws-ecs-service-debug — ECS/Fargate service rollout failures: task events, target group health, IAM roles

IaC

/terraform-state-debug — diagnose locks, drift, orphans
/iac-secrets-scan — repo-wide hardcoded-secret sweep

Containers & CI/CD

/image-cve-triage — prioritise CVE scanner output by exploitability + fix availability
/github-actions-review — security review of GitHub Actions workflow files

Observability & incident

/prometheus-query-helper — intent → PromQL with rationale
/log-pattern-extract — cluster repeated errors out of a log dump
/postmortem — blameless post-mortem from a transcript
/runbook-from-incident — turn a resolved incident into a reusable runbook

Networking / database

/dns-debug — multi-resolver dig, propagation, DNSSEC
/tls-cert-audit — chain inspection, expiry, weak ciphers across a list of hosts
/postgres-health — bloat, long queries, replication lag, missing indexes
/redis-health — memory pressure, slow log, persistence config, eviction patterns
/db-migration-review — flag risky migration patterns

Security & repo hygiene

/cve-impact-assessment — given a CVE, check whether your stack is affected
/repo-health — README, license, CI, branch protection, stale branches
/dependency-upgrade-plan — group outdated deps by risk and suggest batching

Contributing

See CONTRIBUTING.md. The short version:

Add the canonical workflow to workflows/<domain>/<name>.md.
Update the Available workflows table in this README.
Keep workflows read-only by default. Anything mutating must be opt-in (e.g. a DEEP=yes flag) and clearly flagged.

License

MIT — use freely, attribution appreciated but not required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

devops-ai-workflows

What's inside

Available workflows

Kubernetes

AWS / Cloud

IaC

Containers & CI/CD

Security

Observability & Incident

Prompts

Rules

Scripts

Using a workflow

In AI agents

As a plain human workflow

Repo layout

Roadmap

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
prompts		prompts
rules		rules
scripts		scripts
workflows		workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

devops-ai-workflows

What's inside

Available workflows

Kubernetes

AWS / Cloud

IaC

Containers & CI/CD

Security

Observability & Incident

Prompts

Rules

Scripts

Using a workflow

In AI agents

As a plain human workflow

Repo layout

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages