A Claude Code skill that performs automated EKS operational excellence assessments. It connects to a live EKS cluster, checks 37 items across 10 operational areas, and produces a rated report with prioritized recommendations.
Checks are informed by the EKS Best Practices Guide and EKS User Guide. All operations are read-only — the skill does not modify your cluster.
Disclaimer: This is sample code provided for educational and demonstration purposes only. It is not production-ready and should be reviewed, tested, and validated against your organization's security and operational requirements before use. The IAM permissions, MCP server configuration, and assessment logic should be adapted for your environment.
- Getting Started
- What Gets Assessed
- Output
- MCP Server Setup
- Required Permissions
- Limitations
- Troubleshooting
- Project Structure
- Contributing
- Security
- License
- Claude Code installed
- Python 3.10+ and uv
- AWS credentials configured —
aws sts get-caller-identityshould succeed
git clone https://github.com/aws-samples/sample-eks-operation-review-skill.git
cd sample-eks-operation-review-skill
claudeOn first launch, Claude Code will prompt you to enable two MCP servers from .mcp.json. Enable both — they are required for the skill to work:
awslabs.eks-mcp-server— connects to your EKS clusterawslabs.aws-documentation-mcp-server— looks up AWS documentation during assessment
Then run:
/eks-operation-review
The skill discovers your EKS clusters, asks you to pick one, and walks you through the assessment.
| # | Area | Examples |
|---|---|---|
| 01 | Cluster Lifecycle | Version currency, upgrade readiness, deprecated APIs |
| 02 | Infrastructure as Code | IaC provenance, GitOps tools, drift detection |
| 03 | Access & Identity | IRSA / Pod Identity, RBAC, API server endpoint, Pod Security Admission |
| 04 | Observability | Control plane logging, metrics, log aggregation, alerting |
| 05 | Workload Configuration | Resource requests, health probes, PDBs, image tags |
| 06 | Networking | IP capacity, CoreDNS, network policies |
| 07 | Autoscaling | Karpenter / CA, HPA, topology spread |
| 08 | Deployment Practices | Rollout strategy, CI/CD, graceful shutdown |
| 09 | Operational Processes | Backup / DR, runbooks, on-call |
| 10 | Add-on Management | Managed add-ons, node health monitoring, cluster insights |
~70–75% of items are fully automatable. Items that require human knowledge (runbooks, on-call processes) are marked UNKNOWN with suggestions for what to investigate.
Reports are generated in the workspace root:
| Format | Filename |
|---|---|
| Markdown | EKS-Operation-Review-<cluster>-<date>.md |
| HTML (optional) | EKS-Operation-Review-<cluster>-<date>.html |
Each report includes an executive summary, maturity score, per-section findings table, prioritized actions (Critical / Important / Quick Wins), and AWS documentation references.
This skill uses two MCP servers, both pre-configured in .mcp.json. No setup is needed for the default configuration — just clone and run.
MCP server versions are pinned (
awslabs.eks-mcp-server@0.1.28,awslabs.aws-documentation-mcp-server@1.1.21) to keep behaviour reproducible and avoid pulling unreviewed upstream updates. To upgrade, bump the version strings in.mcp.jsonafter reviewing the upstream changelog at awslabs/mcp.
Switching to the AWS-Managed EKS MCP Server
The default uses the open-source EKS MCP server. If your team needs CloudTrail audit logging, automatic updates, or the built-in troubleshooting knowledge base, you can switch to the AWS-managed EKS MCP server instead.
- Attach the
AmazonEKSMCPReadOnlyAccessmanaged policy to your IAM user/role. - Replace the
awslabs.eks-mcp-serverblock in.mcp.json(replace{region}with your AWS region):
"awslabs.eks-mcp-server": {
"command": "uvx",
"args": [
"mcp-proxy-for-aws@latest",
"https://eks-mcp.{region}.api.aws/mcp",
"--service", "eks-mcp",
"--profile", "default",
"--region", "{region}",
"--read-only"
]
}Important: The server name (
"awslabs.eks-mcp-server") must stay exactly as shown. Claude Code uses this name to route tool calls — changing it will prevent the skill from working.
See the Getting Started guide for full setup instructions.
Using a specific AWS profile or region
Update the env block for the EKS MCP server in .mcp.json:
"env": {
"AWS_PROFILE": "your-profile",
"AWS_REGION": "your-region",
"FASTMCP_LOG_LEVEL": "ERROR"
}Already have MCP servers configured globally?
Claude Code merges MCP config from global (~/.claude/settings.json) and project (.mcp.json) levels. If you already have an EKS MCP server configured globally:
- Same server name (
awslabs.eks-mcp-serverin both) — the project config takes precedence. No action needed. - Different server name (e.g.,
eks-mcpglobally) — both servers will run. Disable the duplicate to avoid conflicts.
Replace <region> and <account-id> with your values. The second statement uses "*" because those actions do not support resource-level permissions — see the AWS service authorization reference.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EKSReadScoped",
"Effect": "Allow",
"Action": [
"eks:DescribeCluster",
"eks:ListNodegroups",
"eks:DescribeNodegroup",
"eks:ListAddons",
"eks:DescribeAddon",
"eks:DescribeAddonVersions",
"eks:ListInsights",
"eks:DescribeInsight",
"eks:ListAccessEntries",
"eks:DescribeAccessEntry",
"eks:ListPodIdentityAssociations"
],
"Resource": "arn:aws:eks:<region>:<account-id>:cluster/*"
},
{
"Sid": "AccountLevelReads",
"Effect": "Allow",
"Action": [
"eks:ListClusters",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"iam:ListAttachedRolePolicies",
"iam:ListRolePolicies",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"logs:DescribeLogGroups",
"cloudwatch:DescribeAlarms"
],
"Resource": "*"
}
]
}Tip: If using the AWS-managed EKS MCP server, attach the
AmazonEKSMCPReadOnlyAccessmanaged policy instead.
Your IAM identity needs read access to Kubernetes resources (Nodes, Pods, Deployments, Services, etc.) via an EKS access entry or aws-auth ConfigMap.
- One cluster at a time — run the skill again for additional clusters.
- Process questions are UNKNOWN — items like runbooks, on-call rotation, and post-incident reviews cannot be detected from cluster state. These are marked UNKNOWN with investigation guidance.
- Point-in-time snapshot — reflects cluster state at the time of the run; does not monitor ongoing changes.
- Requires cluster access — your IAM identity must have both AWS API permissions and Kubernetes RBAC access.
MCP server not responding
- Check Python and uv are installed:
uv --version - Check AWS credentials:
aws sts get-caller-identity - Test the MCP server directly:
uvx awslabs.eks-mcp-server@0.1.28 - Verify
AWS_PROFILEandAWS_REGIONin.mcp.jsonmatch your environment
No clusters found
The skill lists clusters in the region configured in your AWS credentials. To target a different region, set AWS_REGION in .mcp.json or your environment.
Permission denied errors
Ensure your IAM identity has the permissions listed in Required Permissions and has a Kubernetes RBAC binding via EKS access entry or aws-auth ConfigMap.
.claude/commands/eks-operation-review.md # Skill entry point
CLAUDE.md # Instructions for Claude Code
.mcp.json # MCP server configuration
steering/ # Per-section check instructions
cluster-lifecycle.md
infrastructure-as-code.md
access-identity.md
observability.md
workload-configuration.md
networking.md
autoscaling.md
deployment-practices.md
operational-processes.md
addon-management.md
report-generation.md
tools/report_to_html.py # Markdown → HTML converter
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
This skill is read-only and does not create, modify, or delete any AWS or Kubernetes resources. All operations are describe, list, and get calls.
If you discover a security vulnerability, please see SECURITY.md for responsible disclosure instructions. Do not open a public issue for security vulnerabilities.
This project is licensed under the MIT-0 License.

