Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,24 @@ repo to install.
- [**Cloud Run Basics**](./skills/cloud/cloud-run-basics)
- [**Cloud SQL Basics**](./skills/cloud/cloud-sql-basics)
- [**Firebase Basics**](./skills/cloud/firebase-basics)
- [**Kubernetes Engine (GKE) Basics**](./skills/cloud/gke-basics)
- **Kubernetes Engine (GKE)**:
- [Basics](./skills/cloud/gke-basics)
- [App Onboarding](./skills/cloud/gke-app-onboarding)
- [Backup & DR](./skills/cloud/gke-backup-dr)
- [Batch & HPC](./skills/cloud/gke-batch-hpc)
- [Cluster Creation](./skills/cloud/gke-cluster-creation)
- [Compute Classes](./skills/cloud/gke-compute-classes)
- [Cost Optimization](./skills/cloud/gke-cost)
- [Golden Path](./skills/cloud/gke-golden-path)
- [AI/ML Inference](./skills/cloud/gke-inference)
- [Multi-tenancy](./skills/cloud/gke-multitenancy)
- [Networking](./skills/cloud/gke-networking)
- [Observability](./skills/cloud/gke-observability)
- [Reliability](./skills/cloud/gke-reliability)
- [Scaling](./skills/cloud/gke-scaling)
- [Security](./skills/cloud/gke-security)
- [Storage](./skills/cloud/gke-storage)
- [Upgrades](./skills/cloud/gke-upgrades)
- [**Workload Manager Basics**](./skills/cloud/workload-manager-basics)
- [**Recipe: Onboarding to Google Cloud**](./skills/cloud/google-cloud-recipe-onboarding)
- [**Recipe: Authenticating to Google Cloud**](./skills/cloud/google-cloud-recipe-auth)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,21 +1,34 @@
---
name: gke-app-onboarding
description: >-
Manages GKE application onboarding, covering containerization, deployment
manifests, and migration. Use when onboarding or deploying an application to
GKE for the first time, or containerizing an app for GKE. Don't use for
general GKE cluster administration or upgrades (use gke-basics or
gke-upgrades instead).
---

# GKE App Onboarding

This reference provides workflows for containerizing and deploying applications to GKE for the first time.
This reference provides workflows for containerizing and deploying applications
to GKE for the first time.

> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`, `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`
> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`,
> `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`

## Workflow

### 1. App Assessment

Before containerizing, assess the application:

- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files, secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?
- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files,
secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?

### 2. Containerization

Expand All @@ -38,14 +51,19 @@ ENTRYPOINT ["/server"]
```

**Best practices:**
- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

**Alternatives:**
- **Cloud Native Buildpacks** — auto-detect language and build without a Dockerfile: `pack build <image> --builder gcr.io/buildpacks/builder:latest`
- **Skaffold** — development workflow tool for iterating on containerized apps: `skaffold dev`
- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

For applications where writing a Dockerfile is not preferred, you can use
[**Cloud Native Buildpacks**](https://buildpacks.io/) to automatically detect
the language and build a container image:

```bash
pack build <image> --builder gcr.io/buildpacks/builder:latest
```

### 3. Image Management

Expand All @@ -60,7 +78,8 @@ docker build -t <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG> .
docker push <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG>
```

**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to detect issues in base images and dependencies.
**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to
detect issues in base images and dependencies.

```bash
# Check scan results
Expand Down Expand Up @@ -127,10 +146,12 @@ spec:
```

**Checklist for manifests:**
- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for external)

- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for
external)

### 5. Deploy

Expand All @@ -154,7 +175,8 @@ kubectl get pods -l app=my-app
## Next Steps

Once the application is running on GKE:
- Configure autoscaling — see [gke-scaling.md](./gke-scaling.md)
- Set up observability — see [gke-observability.md](./gke-observability.md)
- Harden security — see [gke-security.md](./gke-security.md)
- Configure reliability (PDBs, topology spread) — see [gke-reliability.md](./gke-reliability.md)

- Configure autoscaling — see the `gke-scaling` skill
- Set up observability — see the `gke-observability` skill
- Harden security — see the `gke-security` skill
- Configure reliability (PDBs, topology spread) — see the `gke-reliability` skill
Original file line number Diff line number Diff line change
@@ -1,8 +1,19 @@
---
name: gke-backup-dr
description: >-
Configures Backup for GKE and disaster recovery plans. Use when configuring
GKE backup policies, setting up disaster recovery, or restoring GKE clusters.
Don't use for generic database backups or persistent volume configuration
(use gke-storage instead).
---

# GKE Backup & Disaster Recovery

This reference provides workflows for protecting stateful workloads on GKE using Backup for GKE.
This reference provides workflows for protecting stateful workloads on GKE using
Backup for GKE.

> **MCP Tools:** `get_cluster`, `update_cluster`. **CLI-only:** `gcloud container backup-restore *`
> **MCP Tools:** `get_cluster`, `update_cluster`. **CLI-only:** `gcloud
> container backup-restore *`

## Workflows

Expand Down Expand Up @@ -38,9 +49,11 @@ gcloud container backup-restore backup-plans create <PLAN_NAME> \
```

**Options:**
- `--all-namespaces` — back up everything
- `--included-namespaces=<ns1>,<ns2>` — back up specific namespaces
- `--backup-encryption-key=<KEY>` — encrypt with Customer-Managed Encryption Key (CMEK)

- `--all-namespaces` — back up everything
- `--included-namespaces=<ns1>,<ns2>` — back up specific namespaces
- `--backup-encryption-key=<KEY>` — encrypt with Customer-Managed Encryption
Key (CMEK)

### 3. Create a Manual Backup

Expand Down Expand Up @@ -79,8 +92,11 @@ gcloud container backup-restore restores create <RESTORE_NAME> \

## Best Practices

1. **Automate backups**: Always use a cron schedule for production workloads
2. **Test restores regularly**: Restore to a separate namespace or cluster to verify data integrity
3. **Cross-region DR**: Store backups in a different region or configure cross-region restore plans
4. **Encrypt backups**: Use CMEK for compliance and security requirements
5. **Scope backups**: Back up specific namespaces rather than the entire cluster when possible to reduce restore complexity
1. **Automate backups**: Always use a cron schedule for production workloads
2. **Test restores regularly**: Restore to a separate namespace or cluster to
verify data integrity
3. **Cross-region DR**: Store backups in a different region or configure
cross-region restore plans
4. **Encrypt backups**: Use CMEK for compliance and security requirements
5. **Scope backups**: Back up specific namespaces rather than the entire
cluster when possible to reduce restore complexity
67 changes: 39 additions & 28 deletions skills/cloud/gke-basics/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
---
name: gke-basics
description: "Plan, create, and configure production-ready Google Kubernetes Engine (GKE) clusters using the golden path Autopilot configuration. Covers Day-0 checklist, Autopilot vs Standard, networking (private clusters, VPC-native, Gateway API), security (Workload Identity, Secret Manager, RBAC hardening), observability, scaling, cost optimization, and AI/ML inference. WHEN: create GKE cluster, provision GKE environment, design GKE networking, secure GKE, optimize GKE cost, GKE autoscaling, GKE inference, GKE upgrade, GKE observability, GKE multi-tenancy, GKE batch, GKE HPC, GKE compute class."
description: >-
Provides core GKE cluster discovery and lifecycle management. Use when
discovering available GKE skills or managing core cluster lifecycles. Don't
use for specialized GKE tasks like networking, security, or scaling (use the
respective gke-* skills instead).
---

# Google Kubernetes Engine (GKE) Basics

GKE is a managed Kubernetes platform on Google Cloud for deploying, scaling, and operating containerized applications. This skill defaults to the **golden path Autopilot configuration** — see [gke-golden-path.md](./references/gke-golden-path.md) for defaults, rules, and guardrails.
GKE is a managed Kubernetes platform on Google Cloud for deploying, scaling, and
operating containerized applications. This skill defaults to the **golden path
Autopilot configuration** — see the `gke-golden-path` skill
for defaults, rules, and guardrails.

## Quick Start

Expand All @@ -19,31 +26,35 @@ kubectl create deployment hello-server \

## Reference Directory

Load the relevant reference based on trigger keywords. Prefer the most specific match; if ambiguous, ask the user to clarify.

| Scenario | Trigger Keywords | Reference |
|----------|-----------------|-----------|
| Core Concepts | Autopilot vs Standard, architecture, pricing, what is GKE | [core-concepts.md](./references/core-concepts.md) |
| Golden Path & Defaults | golden path, Day-0 checklist, production defaults, cluster defaults | [gke-golden-path.md](./references/gke-golden-path.md) |
| Cluster Creation | create cluster, new cluster, provision GKE | [gke-cluster-creation.md](./references/gke-cluster-creation.md) |
| Networking | private cluster, VPC, subnet, Gateway API, DNS, ingress, egress, datapath | [gke-networking.md](./references/gke-networking.md) |
| Security & IAM | Workload Identity, Secret Manager, RBAC, Binary Auth, hardening, audit, gVisor, IAM roles | [gke-security.md](./references/gke-security.md) |
| Scaling | HPA, VPA, autoscaler, autoscaling, NAP, scale pods, scale nodes | [gke-scaling.md](./references/gke-scaling.md) |
| Compute Classes | ComputeClass, machine family, Spot fallback, GPU node pool, node selection | [gke-compute-classes.md](./references/gke-compute-classes.md) |
| Cost | cost, savings, Spot VMs, rightsizing, CUD, optimize spend, budget | [gke-cost.md](./references/gke-cost.md) |
| AI/ML Inference | inference, model serving, LLM, GPU, TPU, GIQ, vLLM | [gke-inference.md](./references/gke-inference.md) |
| Upgrades | upgrade, maintenance window, release channel, patching, version | [gke-upgrades.md](./references/gke-upgrades.md) |
| Observability | monitoring, logging, Prometheus, Grafana, metrics, alerts, dashboards | [gke-observability.md](./references/gke-observability.md) |
| Multi-tenancy | multi-tenant, namespace isolation, team access, enterprise, RBAC planning | [gke-multitenancy.md](./references/gke-multitenancy.md) |
| Batch & HPC | batch, HPC, job queue, high performance, MPI, parallel | [gke-batch-hpc.md](./references/gke-batch-hpc.md) |
| App Onboarding | containerize, deploy app, Dockerfile, onboard, migrate to GKE | [gke-app-onboarding.md](./references/gke-app-onboarding.md) |
| Backup & DR | backup, restore, disaster recovery, CMEK | [gke-backup-dr.md](./references/gke-backup-dr.md) |
| Storage | storage, PVC, persistent volume, StorageClass, Filestore, GCS FUSE | [gke-storage.md](./references/gke-storage.md) |
| Reliability | PDB, health probe, liveness, readiness, topology spread, graceful shutdown | [gke-reliability.md](./references/gke-reliability.md) |
| Client Libraries | client library, client-go, kubernetes python, kubernetes java, kubernetes SDK | [client-library-usage.md](./references/client-library-usage.md) |
| Infrastructure as Code | Terraform, IaC, HCL, infrastructure as code | [iac-usage.md](./references/iac-usage.md) |
| MCP Server | MCP tools, MCP server, MCP setup | [mcp-usage.md](./references/mcp-usage.md) |
| CLI / Tools | gcloud, kubectl, commands, how to | [cli-reference.md](./references/cli-reference.md) |
| Production Audit | production readiness, compliance, golden path check | [gke-cluster-creation.md](./references/gke-cluster-creation.md) |
Load the relevant reference based on trigger keywords. Prefer the most specific
match; if ambiguous, ask the user to clarify. If a referenced sibling skill
is not installed or cannot be accessed, inform the user that
they may need to install that specific skill (e.g., `gke-networking`), and fall
back to your general GKE knowledge.

Scenario | Trigger Keywords | Reference
---------------------- | ----------------------------------------------------------------------------------------- | ---------
Core Concepts | Autopilot vs Standard, architecture, pricing, what is GKE | [core-concepts.md](./references/core-concepts.md)
Golden Path & Defaults | golden path, Day-0 checklist, production defaults, cluster defaults | `gke-golden-path`
Cluster Creation | create cluster, new cluster, provision GKE | `gke-cluster-creation`
Networking | private cluster, VPC, subnet, Gateway API, DNS, ingress, egress, datapath | `gke-networking`
Security & IAM | Workload Identity, Secret Manager, RBAC, Binary Auth, hardening, audit, gVisor, IAM roles | `gke-security`
Scaling | HPA, VPA, autoscaler, autoscaling, NAP, scale pods, scale nodes | `gke-scaling`
Compute Classes | ComputeClass, machine family, Spot fallback, GPU node pool, node selection | `gke-compute-classes`
Cost | cost, savings, Spot VMs, rightsizing, CUD, optimize spend, budget | `gke-cost`
AI/ML Inference | inference, model serving, LLM, GPU, TPU, GIQ, vLLM | `gke-inference`
Upgrades | upgrade, maintenance window, release channel, patching, version | `gke-upgrades`
Observability | monitoring, logging, Prometheus, Grafana, metrics, alerts, dashboards | `gke-observability`
Multi-tenancy | multi-tenant, namespace isolation, team access, enterprise, RBAC planning | `gke-multitenancy`
Batch & HPC | batch, HPC, job queue, high performance, MPI, parallel | `gke-batch-hpc`
App Onboarding | containerize, deploy app, Dockerfile, onboard, migrate to GKE | `gke-app-onboarding`
Backup & DR | backup, restore, disaster recovery, CMEK | `gke-backup-dr`
Storage | storage, PVC, persistent volume, StorageClass, Filestore, GCS FUSE | `gke-storage`
Reliability | PDB, health probe, liveness, readiness, topology spread, graceful shutdown | `gke-reliability`
Client Libraries | client library, client-go, kubernetes python, kubernetes java, kubernetes SDK | [client-library-usage.md](./references/client-library-usage.md)
Infrastructure as Code | Terraform, IaC, HCL, infrastructure as code | [iac-usage.md](./references/iac-usage.md)
MCP Server | MCP tools, MCP server, MCP setup | [mcp-usage.md](./references/mcp-usage.md)
CLI / Tools | gcloud, kubectl, commands, how to | [cli-reference.md](./references/cli-reference.md)
Production Audit | production readiness, compliance, golden path check | `gke-cluster-creation`

*If you need product information not found in these references, use the Developer Knowledge MCP server `search_documents` tool.*
Loading