Skip to content

23seriy/devops-ai-workflows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

devops-ai-workflows

A growing collection of AI-agent workflows, prompts, and rules for day-to-day DevOps / SRE / platform work.

Note: "workflows" here means AI coding-agent workflows (Windsurf, Cursor, Claude Code, etc.) — not GitHub Actions.

What's inside

Folder Purpose Audience
workflows/ Workflow definitions, grouped by domain Everyone
prompts/ Reusable system / task prompts (incident triage, code review, post-mortem, etc.) Any LLM
rules/ Editor / agent rule files (.windsurfrules, .cursorrules, Copilot instructions) Per-tool
scripts/ Standalone shell scripts referenced by workflows Anyone with a shell

Available workflows

Kubernetes

Workflow Slash command Description Prerequisites
k8s-debug /k8s-debug General-purpose, read-only cluster diagnostics across nodes, pods, workloads, networking, storage, RBAC, events, and resource pressure. kubectl. Optional: jq, metrics-server.
k8s-workload-debug /k8s-workload-debug Deep-dive on a single Deployment / StatefulSet / DaemonSet / Job / Pod: rollout, spec, probes, resources, logs, networking, storage, config. kubectl. Optional: jq, metrics-server.
k8s-rbac-audit /k8s-rbac-audit RBAC risk audit — wildcards, cluster-admin bindings, risky verb/resource combos, over-privileged ServiceAccounts, anonymous access. kubectl, jq. Optional: kubectl-who-can.
k8s-cost-hotspots /k8s-cost-hotspots Find waste: over-provisioned workloads, missing requests/limits, idle workloads, orphan PVCs/PVs, idle LoadBalancers. kubectl, jq, metrics-server.
k8s-upgrade-readiness /k8s-upgrade-readiness Pre-flight before a control-plane / node upgrade: deprecated APIs, version skew, PDB gaps, expiring certs, broken webhooks. kubectl. Optional: kubent or pluto, helm.
helm-release-debug /helm-release-debug Diagnose a stuck or failed Helm release: history, values diff, hook failures, rendered manifest vs cluster, workload health. helm v3, kubectl. Optional: jq, yq.
helm-chart-review /helm-chart-review Review a Helm chart for security, reliability, and best practices: resource specs, probes, security context, PDBs, anti-affinity, RBAC. Helm chart source. Optional: helm CLI.

AWS / Cloud

Workflow Slash command Description Prerequisites
aws-account-audit /aws-account-audit Read-only AWS account security & hygiene audit: IAM, S3, EC2, RDS, CloudTrail, encryption, GuardDuty, SecurityHub. aws CLI. Optional: jq.
aws-cost-quickscan /aws-cost-quickscan Find AWS cost waste: idle EC2/RDS, unattached EBS, old snapshots, expensive log groups, NAT data processing, missing Savings Plans. aws CLI, Cost Explorer enabled. Optional: jq.
aws-vpc-debug /aws-vpc-debug Diagnose VPC connectivity: trace path across SGs, NACLs, route tables, NAT/IGW/TGW, VPC endpoints, DNS, and flow logs. aws CLI. Optional: jq, dig.
aws-iam-policy-review /aws-iam-policy-review Explain an IAM policy and flag risks: admin-equivalent access, privilege escalation paths, wildcard actions, missing conditions. aws CLI. Optional: jq.

IaC

Workflow Slash command Description Prerequisites
terraform-plan-review /terraform-plan-review Explain a Terraform plan and flag risky changes: destroys, replacements, security group mutations, IAM changes, blast radius. terraform plan output. Optional: terraform CLI, jq.

Containers & CI/CD

Workflow Slash command Description Prerequisites
ci-debug /ci-debug Diagnose a failing CI/CD pipeline: parse build logs from Jenkins, GitHub Actions, GitLab CI, or Bitbucket Pipelines. Root cause analysis and fix suggestions. Build log output. Optional: repo source, CI config file.
jenkins-pipeline-review /jenkins-pipeline-review Review Jenkinsfile / shared-library Groovy for security risks, anti-patterns, missing error handling, credential leaks, CPS issues, and build config cross-references. Jenkinsfile(s) or vars/*.groovy. Optional: repositories_v2.json.
release-checklist /release-checklist Pre-release safety gate: scope, deploy order, rollback, tests, monitoring, and communication before production release. PR/diff summary. Optional: test results, plans, diffs.
dockerfile-review /dockerfile-review Review Dockerfiles for security, size, caching, and best practices. Flags CVE-prone bases, leaked secrets, missing health checks. Dockerfile(s). Optional: docker, trivy.

Security

Workflow Slash command Description Prerequisites
secrets-leak-scan /secrets-leak-scan Scan git repo history for leaked secrets: API keys, passwords, tokens, private keys. Uses gitleaks, trufflehog, or regex fallback. Git repo. Optional: gitleaks, trufflehog.
repo-health /repo-health Audit repository hygiene: README, license, CI, branch/release hygiene, tracked secrets, ownership, and automation gaps. Local git repo. Optional: gh, jq.

Observability & Incident

Workflow Slash command Description Prerequisites
incident-triage /incident-triage Guided first 15 minutes of a production incident: timeline, blast radius, evidence gathering, mitigation suggestions. Access to affected environment.

More on the way — see Roadmap.

Prompts

Reusable system prompts you can paste into any AI agent for common DevOps tasks:

Prompt What it does
incident-commander Puts the AI in incident-commander mode: timeline, blast radius, action tracking, status updates.
postmortem-writer Generates a blameless post-mortem from incident notes: timeline, root cause, impact, action items.
code-review-devops Reviews IaC / pipeline / Docker / K8s code with a security-first DevOps lens.
pr-description Generates a PR description from a diff: what, why, how, testing, risk, rollback plan.
explain-like-a-senior Explains infrastructure code to junior engineers: what it does, why, gotchas, and how it fits together.
runbook-from-incident Converts incident notes or post-mortems into reusable runbooks with diagnosis, mitigation, escalation, and follow-up steps.

Rules

Persistent instruction files that shape AI behavior. Copy into a project's .windsurf/rules/ or use as .windsurfrules:

Rule file What it does
devops-agent.windsurfrules Safety guardrails for AI in DevOps repos: never modify prod without confirmation, prefer read-only, never hardcode secrets, always check context, GitOps awareness, multi-repo coordination.
terraform.windsurfrules Terraform-specific: state safety, ForceNew attribute warnings, provider/module pinning, workspace safety, import workflow, prevent_destroy reminders.
kubernetes.windsurfrules Kubernetes-specific: context verification, dry-run first, Helm safety, ArgoCD/GitOps awareness, secret handling, debugging approach, RBAC best practices.

Scripts

Standalone shell utilities referenced by workflows or useful on their own:

Script Usage
k8s-snapshot.sh ./k8s-snapshot.sh [namespace|all] [output-dir] — dump cluster state (nodes, pods, events, services, top) to a timestamped Markdown file.
aws-whoami.sh ./aws-whoami.sh [profile] — quick AWS identity check: caller, region, account alias, org, SSO role.
stale-branches.sh ./stale-branches.sh [days] [--remote] — list git branches older than N days with last commit info.
validate-repo.sh ./scripts/validate-repo.sh — validate workflow frontmatter, README links, script executability, and optional lint checks.

Using a workflow

In AI agents

Open the matching file in workflows/ and either:

  • invoke it as a slash command if your agent supports workflow discovery from this repo,
  • paste the relevant section into the agent's chat, or
  • include the file as context and ask the agent to follow it.

As a plain human workflow

Every workflow is just Markdown with shell commands. You can run the steps yourself in a terminal — no AI required.

Repo layout

devops-ai-workflows/
├── workflows/
│   ├── kubernetes/          # Kubernetes workflow definitions
│   ├── aws/                 # AWS / cloud workflow definitions
│   ├── iac/                 # Infrastructure as Code workflows
│   ├── cicd/                # CI/CD pipeline workflows
│   ├── containers/          # Container & image workflows
│   ├── security/            # Security & repo hygiene workflows
│   └── observability/       # Observability & incident workflows
├── prompts/                 # Reusable LLM prompts
├── rules/                   # Editor/agent rule files
├── scripts/                 # Standalone shell helpers
├── CONTRIBUTING.md
├── LICENSE
└── README.md

Roadmap

Ideas I plan to add (PRs welcome):

AWS / cloud

  • /aws-eks-debug — bridge EKS + Kubernetes: node groups, OIDC, add-ons, IAM roles for service accounts
  • /aws-rds-health — RDS/Aurora diagnostics: events, metrics, parameter groups, replication lag
  • /aws-lambda-debug — Lambda diagnostics: errors, throttles, DLQ, VPC/ENI, CloudWatch logs
  • /aws-ecs-service-debug — ECS/Fargate service rollout failures: task events, target group health, IAM roles

IaC

  • /terraform-state-debug — diagnose locks, drift, orphans
  • /iac-secrets-scan — repo-wide hardcoded-secret sweep

Containers & CI/CD

  • /image-cve-triage — prioritise CVE scanner output by exploitability + fix availability
  • /github-actions-review — security review of GitHub Actions workflow files

Observability & incident

  • /prometheus-query-helper — intent → PromQL with rationale
  • /log-pattern-extract — cluster repeated errors out of a log dump
  • /postmortem — blameless post-mortem from a transcript
  • /runbook-from-incident — turn a resolved incident into a reusable runbook

Networking / database

  • /dns-debug — multi-resolver dig, propagation, DNSSEC
  • /tls-cert-audit — chain inspection, expiry, weak ciphers across a list of hosts
  • /postgres-health — bloat, long queries, replication lag, missing indexes
  • /redis-health — memory pressure, slow log, persistence config, eviction patterns
  • /db-migration-review — flag risky migration patterns

Security & repo hygiene

  • /cve-impact-assessment — given a CVE, check whether your stack is affected
  • /repo-health — README, license, CI, branch protection, stale branches
  • /dependency-upgrade-plan — group outdated deps by risk and suggest batching

Contributing

See CONTRIBUTING.md. The short version:

  1. Add the canonical workflow to workflows/<domain>/<name>.md.
  2. Update the Available workflows table in this README.
  3. Keep workflows read-only by default. Anything mutating must be opt-in (e.g. a DEEP=yes flag) and clearly flagged.

License

MIT — use freely, attribution appreciated but not required.

About

Curated collection of AI-agent workflows, prompts & rules for DevOps/SRE — Kubernetes debugging, AWS audits, Terraform plan reviews, CI/CD triage, Dockerfile reviews, secrets scanning & incident response. Works with Windsurf, Cursor, Claude Code or any LLM.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages