diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 9c94fe464d..5eb34695dc 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -75,6 +75,7 @@ **** xref:deploy:redpanda/kubernetes/k-requirements.adoc[Requirements and Recommendations] **** xref:deploy:redpanda/kubernetes/k-tune-workers.adoc[Tune Worker Nodes] **** xref:deploy:redpanda/kubernetes/k-production-deployment.adoc[Deploy Redpanda] +**** xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[] **** xref:deploy:redpanda/kubernetes/k-high-availability.adoc[High Availability] *** xref:deploy:redpanda/manual/index.adoc[Linux] **** xref:deploy:redpanda/manual/production/requirements.adoc[Hardware and Software Requirements] diff --git a/modules/deploy/pages/redpanda/kubernetes/k-production-deployment.adoc b/modules/deploy/pages/redpanda/kubernetes/k-production-deployment.adoc index 393ac88948..27faaef75b 100644 --- a/modules/deploy/pages/redpanda/kubernetes/k-production-deployment.adoc +++ b/modules/deploy/pages/redpanda/kubernetes/k-production-deployment.adoc @@ -777,6 +777,10 @@ include::deploy:partial$kubernetes/guides/troubleshoot.adoc[leveloffset=+1] == Next steps +After deploying Redpanda, validate your production readiness: + +- xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[Production readiness checklist] - Comprehensive validation of your deployment against production standards + See the xref:manage:kubernetes/index.adoc[Manage Kubernetes topics] to learn how to customize your deployment to meet your needs. include::shared:partial$suggested-reading.adoc[] diff --git a/modules/deploy/pages/redpanda/kubernetes/k-production-readiness.adoc b/modules/deploy/pages/redpanda/kubernetes/k-production-readiness.adoc new file mode 100644 index 0000000000..01ae3cd02a --- /dev/null +++ b/modules/deploy/pages/redpanda/kubernetes/k-production-readiness.adoc @@ -0,0 +1,1415 @@ += Production Readiness Checklist +:description: Comprehensive checklist for validating Redpanda deployments in Kubernetes against production readiness standards. +:page-context-links: [{"name": "Linux", "to": "deploy:redpanda/linux/index.adoc" },{"name": "Kubernetes", "to": "deploy:redpanda/kubernetes/index.adoc" } ] +:page-categories: Deployment +:learning-objective-1: Validate a Kubernetes-deployed Redpanda cluster against production readiness standards + +Before running a production workload on Redpanda in Kubernetes, follow this readiness checklist. + +By completing this checklist, you will be able to: + +* [ ] {learning-objective-1} + +NOTE: For Linux deployments, see the xref:deploy:redpanda/manual/production/production-readiness.adoc[Production Readiness Checklist for Linux]. + +== Critical requirements + +The Critical requirements checklist helps ensure that: + +- You have specified all required defaults and configuration items. +- You have the optimal hardware setup. +- You have enabled security. +- You are set up to run in production. + +=== Redpanda license + +If using Enterprise features, validate that you are using a valid Enterprise license: + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster license info -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +LICENSE INFORMATION +=================== +Organization: Your Company Name +Type: enterprise +Expires: Dec 31 2026 +---- +-- + +Production deployments using Enterprise features (such as Tiered Storage, Schema Registry, or Continuous Data Balancing) must have a valid Enterprise license with a sufficient expiration date. + +See also: xref:get-started:licensing/index.adoc[Redpanda Licensing] + +[NOTE] +==== +**SASL authentication flags** + +The `rpk` commands throughout this checklist include SASL authentication flags (`-X user`, `-X pass`, `-X sasl.mechanism`). If your cluster does not use SASL authentication, you can omit these flags from all commands. For example: + +.Input +[,bash] +---- +# With SASL authentication +kubectl exec -n -c redpanda -- rpk cluster health -X user= -X pass= -X sasl.mechanism= + +# Without SASL authentication +kubectl exec -n -c redpanda -- rpk cluster health +---- + +Common SASL mechanisms are `SCRAM-SHA-256` or `SCRAM-SHA-512`. Update these values as needed for your deployment. +==== + +=== Cluster health + +Check that all brokers are connected and running. Run xref:reference:rpk/rpk-cluster/rpk-cluster-health.adoc[`rpk cluster health`] to check the health of the cluster. No nodes should be down, and there should be zero leaderless or under-replicated partitions. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster health -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +CLUSTER HEALTH OVERVIEW +======================= +Healthy: true +Unhealthy reasons: [] +Controller ID: 0 +All nodes: [0 1 2] +Nodes down: [] +Leaderless partitions (0): [] +Under-replicated partitions (0): [] +---- +-- + +=== Minimum broker count + +You must have at least three brokers running to ensure production-level fault tolerance. + +Production clusters should have an odd number of brokers (3, 5, 7, etc.) for optimal consensus behavior. + +Verify the running broker count: + +.Input +[,bash] +---- +kubectl get pods -n -l app.kubernetes.io/component=redpanda-statefulset +---- + +.Output +[,bash,role=no-copy] +---- +NAME READY STATUS RESTARTS AGE +redpanda-0 2/2 Running 0 10d +redpanda-1 2/2 Running 0 10d +redpanda-2 2/2 Running 0 10d +---- + +Verify the configured replica count in your deployment: + +[tabs] +====== +Helm:: ++ +-- +.Input +[,bash] +---- +helm get values redpanda -n | grep -A 1 "statefulset:" +---- + +.Output +[,bash,role=no-copy] +---- +statefulset: + replicas: 3 +---- +-- + +Operator:: ++ +-- +.Input +[,bash] +---- +kubectl get redpanda redpanda -n -o jsonpath='{.spec.clusterSpec.statefulset.replicas}' +---- + +.Output +[,bash,role=no-copy] +---- +3 +---- +-- +====== + +See also: <> + +=== Active broker membership + +Verify that all brokers are in active state and not being decommissioned. + +Decommissioning is used to permanently remove a broker from the cluster, such as during node pool migrations or cluster downsizing. Brokers in a decommissioned state should not be present in production clusters unless actively performing a planned migration. + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk redpanda admin brokers list -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION +0 4 active true v24.2.4 +1 4 active true v24.2.4 +2 4 active true v24.2.4 +---- + +All brokers must show `active` status. If any broker shows the status `draining` or `decommissioned`, investigate immediately. + +See also: xref:manage:kubernetes/k-decommission-brokers.adoc[Decommission Brokers] + +=== No brokers in maintenance mode + +Check that no brokers are in maintenance mode during normal operations. + +Maintenance mode is used when modifying brokers that will remain as members of the cluster, such as during rolling upgrades or hardware maintenance. While necessary during planned maintenance windows, brokers should not remain in maintenance mode during normal operations. + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster maintenance status -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +NODE-ID ENABLED FINISHED ERRORS PARTITIONS ELIGIBLE TRANSFERRING FAILED +0 false - - - - - - +1 false - - - - - - +2 false - - - - - - +---- + +All brokers should show `ENABLED: false`. If any broker shows `ENABLED: true` outside of a planned maintenance window, investigate immediately. + +See also: xref:manage:kubernetes/k-rolling-restart.adoc[Maintenance Mode] + +=== Consistent Redpanda version + +Check that Redpanda is running the https://github.com/redpanda-data/redpanda/releases[latest point release^] for the major version you're on and that all brokers run the same version. + +**Verify Redpanda broker versions:** + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk redpanda admin brokers list -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION +0 4 active true v25.2.4 +1 4 active true v25.2.4 +2 4 active true v25.2.4 +---- + +All brokers must show the same `BROKER-VERSION`. Version mismatches between brokers can cause compatibility issues and must be resolved before advancing to production. + +**Verify Helm Chart or Operator version compatibility:** + +For Kubernetes deployments, you must also verify that your deployment tool (Helm Chart or Operator) version is compatible with your Redpanda version. The Helm Chart or Operator version must be within one minor version of the Redpanda version. + +For example, if running Redpanda v25.2.x, the Helm Chart or Operator version must be v25.1.x, v25.2.x, or v25.3.x. + +[tabs] +====== +Helm:: ++ +-- +.Input +[,bash] +---- +helm list -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +redpanda redpanda 1 2024-01-15 10:30:00.123456 -0800 PST deployed redpanda-5.2.4 v25.2.4 +---- + +The `CHART` column shows the Helm Chart version (for example, `redpanda-5.2.4`), which should be compatible with the `APP VERSION` (Redpanda version). +-- + +Operator:: ++ +-- +.Input +[,bash] +---- +kubectl get deployment redpanda-controller-manager -n -o jsonpath='{.spec.template.spec.containers[0].image}' +---- + +.Output +[,bash,role=no-copy] +---- +docker.redpanda.com/redpandadata/redpanda-operator:v25.2.4 +---- + +The Operator version is shown in the image tag (for example, `v25.2.4`), which should be compatible with your Redpanda broker version. + +You can also check the Operator version using: + +.Input +[,bash] +---- +kubectl get redpanda redpanda -n -o jsonpath='{.metadata.annotations.redpanda\.com/operator-version}' +---- +-- +====== + +**Version compatibility requirements:** + +* All Redpanda brokers must run the same version +* The Helm Chart or Operator version must be within ±1 minor version of Redpanda version +* Example: Redpanda v25.2.x requires Helm/Operator v25.1.x, v25.2.x, or v25.3.x +* Running incompatible versions can lead to deployment failures or cluster instability. + +=== Version pinning + + +Verify that versions are explicitly pinned in your deployment configuration: + +[tabs] +====== +Helm:: ++ +-- +[,yaml] +---- +image: + tag: v24.2.4 # Pin specific Redpanda version + +console: + enabled: true + image: + tag: v2.4.5 # Pin specific Console version + +connectors: + enabled: true + image: + tag: v1.0.15 # Pin specific Connectors version +---- + +Verify pinned versions: + +.Input +[,bash] +---- +helm get values redpanda -n +---- + +.Output +[,bash,role=no-copy] +---- +image: + tag: v24.2.4 +console: + image: + tag: v2.4.5 +connectors: + image: + tag: v1.0.15 +---- +-- + +Operator:: ++ +-- +[,yaml] +---- +apiVersion: cluster.redpanda.com/v1alpha2 +kind: Redpanda +metadata: + name: redpanda +spec: + clusterSpec: + image: + tag: v24.2.4 # Pin specific Redpanda version + + console: + enabled: true + image: + tag: v2.4.5 # Pin specific Console version + + connectors: + enabled: true + image: + tag: v1.0.15 # Pin specific Connectors version +---- + +Verify pinned versions: + +.Input +[,bash] +---- +kubectl get redpanda redpanda -n -o yaml | grep -A 1 "tag:" +---- +-- +====== + +Pin specific versions for Redpanda and all related components (Console, Connectors). This ensures all environments (dev/staging/prod) run the same tested versions, allows controlled upgrade testing before production rollout, and provides rollback capability to known-good versions. + +Avoid using the latest tag, version ranges (for example, v24.2.x), or unspecified tags, as these can result in unexpected upgrades that introduce breaking changes or cause downtime. + +=== Default topic replication factor + +Check that the default replication factor (≥3) is set appropriately for production. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get default_topic_replications -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +3 +---- +-- + +Setting `default_topic_replications` to `3` or greater ensures new topics are created with adequate fault tolerance. + +See also: xref:manage:kubernetes/k-manage-topics.adoc#choose-the-replication-factor[Choose the Replication Factor] + +=== Existing topics replication factor + +Check that all existing topics have adequate replication (default is `3`). + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk topic list -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +NAME PARTITIONS REPLICAS +_schemas 1 3 +orders 12 3 +payments 8 3 +user-events 16 3 +---- +-- + +All production topics should have `REPLICAS` of three or greater. Topics with single-digit replication are at risk of data loss if a broker fails. + +See also: xref:manage:cluster-maintenance/topic-property-configuration.adoc#change-topic-replication-factor[Change Topic Replication Factor] + +=== Persistent storage configuration + +Verify that you have configured persistent storage (not hostPath or emptyDir) for data persistence. + +[.side-by-side] +-- + +.Input +[,bash] +---- +kubectl get pvc -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +datadir-redpanda-0 Bound pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890 100Gi RWO fast-ssd 10d +datadir-redpanda-1 Bound pvc-b2c3d4e5-f6g7-8901-bcde-fg2345678901 100Gi RWO fast-ssd 10d +datadir-redpanda-2 Bound pvc-c3d4e5f6-g7h8-9012-cdef-gh3456789012 100Gi RWO fast-ssd 10d +---- +-- + +Verify the StatefulSet uses PersistentVolumeClaims: + +.Input +[,bash] +---- +kubectl describe statefulset -n redpanda | grep -A 5 "Volume Claims" +---- + +.Output +[,bash,role=no-copy] +---- +Volume Claims: + Name: datadir + StorageClass: fast-ssd + Labels: + Annotations: + Capacity: 100Gi +---- + +HostPath and emptyDir storage are not suitable for production as they lack durability guarantees. + +See also: xref:manage:kubernetes/storage/k-persistent-storage.adoc[Persistent Storage] + +=== RAID/LVM stripe configuration (multiple disks only) + +If using multiple physical disks, verify they are configured to stripe data across the disks as RAID-0 or LVM stripe (not linear/concat). Striping distributes data across multiple disks in parallel for improved I/O performance. + +.Input +[,bash] +---- +# Check block device configuration on nodes +kubectl debug node/ -it -- chroot /host /bin/bash +lsblk -o NAME,TYPE,SIZE,MOUNTPOINT,FSTYPE +lvs -o lv_name,stripes,stripe_size +mdadm --detail /dev/md* # if using software RAID +---- + +.Output +[,bash,role=no-copy] +---- +# lsblk output +NAME TYPE SIZE MOUNTPOINT FSTYPE +nvme0n1 disk 1.8T +nvme1n1 disk 1.8T +vg0-data lvm 3.6T /var/lib/redpanda xfs + +# lvs output - note stripes > 1 indicates striping +LV #Stripes StripeSize +data 2 256.00k +---- + +.Output +[,bash,role=no-copy] +---- +# mdadm output +/dev/md0: + Raid Level : raid0 + Array Size : 3515625472 (3.27 TiB) + Raid Devices : 2 + + Number Major Minor RaidDevice State + 0 259 0 0 active sync /dev/nvme0n1 + 1 259 1 1 active sync /dev/nvme1n1 +---- + +Using LVM linear/concat or JBOD instead of stripe/RAID-0 across multiple disks will severely degrade performance because data writes are serialized rather than parallelized. For optimal I/O throughput, configure multiple disks in a striped array that writes data across all disks simultaneously. Single disk configurations do not require striping. + +See also: xref:deploy:redpanda/kubernetes/k-production-deployment.adoc#storage[Storage] + +=== Storage performance requirements + +Ensure storage classes provide adequate IOPS and throughput for your workload by using the following specifications when selection a storage class: + +**Performance specifications:** + +* Use NVMe-based storage classes for production deployments +* Specify a minimum 16,000 IOPS (Input/Output Operations Per Second) +* Consider provisioned IOPS where available to meet or exceed the minimum +* Enable xref:develop:config-topics.adoc#configure-write-caching[write caching] to help Redpanda perform better in environments with disks that don't meet the recommended IOPS +* NFS (Network File System) is not supported +* Test storage performance under load + +WARNING: Avoid cloud instance types that use multi-tenant or shared disks, as these can lead to unpredictable performance due to noisy neighbor effects. Examples of instances with shared/multi-tenant storage include AWS is4gen.xlarge and similar instance types across cloud providers. Instead, use instances with dedicated local NVMe storage or provisioned IOPS volumes that guarantee consistent performance. + +Multi-tenant disks can experience: + +* Unpredictable latency spikes from other tenants' workloads +* Inconsistent throughput that varies based on neighbor activity +* IOPS throttling that impacts Redpanda's performance +* Difficulty troubleshooting performance issues due to external factors + +See also: + +* xref:deploy:redpanda/kubernetes/k-requirements.adoc#storage[Storage requirements] +* xref:deploy:redpanda/kubernetes/k-requirements.adoc#cloud-instance-types[Cloud Instance Types] + +=== CPU and memory resource limits + +Verify Pods have resource requests and limits configured. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl get pod -n -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources}' | jq +---- + +.Output +[,bash,role=no-copy] +---- +{ + "limits": { + "cpu": "4", + "memory": "8Gi" + }, + "requests": { + "cpu": "4", + "memory": "8Gi" + } +} +---- +-- + +All Redpanda Pods **must have**: + +* Identical CPU requests and limits (`requests.cpu == limits.cpu`) +* Identical memory requests and limits (`requests.memory == limits.memory`) + +Setting requests equal to limits ensures the Pod receives the `Guaranteed` QoS class, which prevents CPU throttling and reduces the risk of Pod eviction. + +See also: xref:manage:kubernetes/k-manage-resources.adoc[Manage Pod Resources] + +=== CPU to memory ratio + +Ensure adequate memory allocation relative to CPU for optimal performance. + +Production deployments should provision at least 2 GiB of memory per CPU core. The ratio should be at least 1:2 (2 GiB per core). + +Verify the CPU to memory ratio in your configuration: + +[tabs] +====== +Helm:: ++ +-- +.Input +[,bash] +---- +helm get values redpanda -n | grep -A 2 "resources:" +---- + +.Output +[,bash,role=no-copy] +---- +resources: + cpu: + cores: 4 + memory: + container: + min: 8Gi + max: 8Gi +---- +-- + +Operator:: ++ +-- +.Input +[,bash] +---- +kubectl get redpanda redpanda -n -o jsonpath='{.spec.clusterSpec.resources}' | jq +---- + +.Output +[,bash,role=no-copy] +---- +{ + "cpu": { + "cores": 4 + }, + "memory": { + "container": { + "min": "8Gi", + "max": "8Gi" + } + } +} +---- +-- +====== + +In the preceding examples, 4 CPU cores with 8 GiB memory provides a 1:2 ratio (2 GiB per core). + +See also: xref:manage:kubernetes/k-manage-resources.adoc#memory[Memory] + +=== No fractional CPU requests + +Ensure CPU requests use whole numbers for consistent performance. + +Fractional CPUs can lead to performance variability in production. Use whole integer values (`4`, `8`, or `16` are acceptable, while `3.5` or `7.5` are not). + +Verify CPU configuration: + +.Input +[,bash] +---- +kubectl get pod -n -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources.requests.cpu}' +---- + +.Output +[,bash,role=no-copy] +---- +4 +---- + +=== Authorization enabled + +Verify Kafka authorization is enabled for access control. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get kafka_enable_authorization -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +true +---- +-- + +Without authorization enabled, any client can access Kafka APIs without authentication. + +See also: xref:manage:security/authorization/index.adoc[Authorization] + +=== Production mode enabled + +Verify that developer mode and overprovisioned mode are disabled for production stability. + +Check developer mode: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- grep developer_mode /etc/redpanda/redpanda.yaml +---- + +.Output +[,bash,role=no-copy] +---- +developer_mode: false +---- + +Developer mode should never be enabled in production environments. Developer mode disables fsync and bypasses safety checks designed for production workloads. + +Check overprovisioned mode: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- grep overprovisioned /etc/redpanda/redpanda.yaml +---- + +.Output +[,bash,role=no-copy] +---- +overprovisioned: false +---- + +Overprovisioned mode bypasses critical resource checks and should never be enabled in production. This mode is intended only for development environments with constrained resources. + +Verify in Helm values that `resources.cpu.overprovisioned` is not explicitly set to `true` (it's automatically calculated based on CPU allocation). + +=== TLS enabled + +Configure TLS encryption for all client and inter-broker communication. TLS prevents eavesdropping and man-in-the-middle attacks on network traffic. + +Verify TLS is enabled on all listeners: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config export -X user= -X pass= -X sasl.mechanism= | grep -A 10 "kafka_api:" +---- + +.Output +[,bash,role=no-copy] +---- +redpanda: + kafka_api: + - address: 0.0.0.0 + port: 9093 + name: internal + authentication_method: sasl + kafka_api_tls: + - name: internal + enabled: true + cert_file: /etc/tls/certs/tls.crt + key_file: /etc/tls/certs/tls.key +---- + +Required TLS listeners include: + +* **kafka_api** - Client connections to Kafka API +* **admin_api** - Administrative REST API access +* **rpc_server** - Inter-broker communication +* **schema_registry** - Schema Registry API (if used) + +Verify certificates are properly mounted: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- ls -la /etc/tls/certs/ +---- + +.Output +[,bash,role=no-copy] +---- +total 16 +-rw-r--r-- 1 redpanda redpanda 1234 Dec 15 10:00 ca.crt +-rw-r--r-- 1 redpanda redpanda 1675 Dec 15 10:00 tls.crt +-rw------- 1 redpanda redpanda 1704 Dec 15 10:00 tls.key +---- + +See also: xref:manage:kubernetes/security/tls/index.adoc[TLS Encryption] + +=== Authentication enabled + +Configure appropriate authentication mechanisms to control access to Redpanda resources. + +Verify SASL users are configured: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk acl user list -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +USERNAME +admin +app-producer +app-consumer +monitoring +---- + +Be sure to adhere to the following authentication requirements: + +* Set up SASL authentication for client connections +* Configure TLS certificates for encryption (see preceding TLS configuration guidance) +* Implement proper user management with principle of least privilege +* Configure xref:manage:security/authorization/acl.adoc[ACLs (Access Control Lists)] for resource authorization + +Verify ACLs are configured: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk acl list -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME OPERATION PERMISSION +User:app-producer * TOPIC orders.* WRITE ALLOW +User:app-consumer * TOPIC orders.* READ ALLOW +User:app-consumer * GROUP consumer-group-1 READ ALLOW +---- + +See also: + +* xref:manage:kubernetes/security/authentication/k-authentication.adoc[Authentication] +* xref:manage:security/authorization/index.adoc[Authorization] + +=== Network security + +Secure network access to the cluster using Kubernetes-native controls. + +Verify NetworkPolicies are configured: + +.Input +[,bash] +---- +kubectl get networkpolicy -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME POD-SELECTOR AGE +redpanda-allow-internal app.kubernetes.io/name=redpanda 10d +redpanda-allow-clients app.kubernetes.io/name=redpanda 10d +redpanda-deny-all-ingress app.kubernetes.io/name=redpanda 10d +---- + +Check NetworkPolicy rules: + +.Input +[,bash] +---- +kubectl describe networkpolicy -n +---- + +Be sure to satisfy the following network security requirements: + +* Configure NetworkPolicies to restrict pod-to-pod communication +* Use TLS for all client connections (see TLS configuration) +* Secure admin API endpoints with xref:manage:kubernetes/security/authentication/k-authentication.adoc[authentication] and xref:manage:security/authorization/index.adoc[authorization] +* Limit ingress traffic to only necessary ports and sources +* Use Kubernetes Services to control external access + +Verify services and exposed ports: + +.Input +[,bash] +---- +kubectl get svc -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) +redpanda ClusterIP None 9093/TCP,9644/TCP,8082/TCP +redpanda-external LoadBalancer 10.100.200.50 9093:30001/TCP +---- + +See also: xref:manage:kubernetes/networking/k-configure-listeners.adoc[Listener Configuration] + +=== Pod Disruption Budget + +Set up PDBs to control voluntary disruptions during maintenance. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl get pdb -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE +redpanda N/A 1 1 10d +---- +-- + +Production deployments must have a PodDisruptionBudget with `maxUnavailable: 1` to prevent simultaneous broker disruptions during voluntary operations like node drains, upgrades, or autoscaler actions. + +See also: https://kubernetes.io/docs/tasks/run-application/configure-pdb/[Kubernetes Pod Disruption Budgets^] + +=== Rack awareness and topology spread + +Configure topology spread constraints to distribute brokers across availability zones. For configuration instructions, see xref:deploy:redpanda/kubernetes/k-high-availability.adoc#multi-az-deployment[Multi-AZ deployment]. + +Production deployments require each Redpanda broker to run in a different availability zone to ensure that a single zone failure does not cause loss of quorum. For a three-broker cluster, brokers must be distributed across three separate zones. + +To verify zone distribution, check your cluster configuration: + +* Verify `topologySpreadConstraints` are configured in your Helm values or Redpanda CR +* Confirm nodes have zone labels (typically `topology.kubernetes.io/zone`) +* Check that brokers are scheduled on nodes in different zones + +See also: xref:manage:kubernetes/k-rack-awareness.adoc[Rack Awareness] + +=== Operator CRDs (Operator deployments only) + +WARNING: If your deployment uses the Redpanda Operator, all required Custom Resource Definitions (CRDs) must be installed with compatible versions. Without correct CRDs, the Operator cannot manage the cluster, leading to configuration drift, failed updates, and potential data loss. + +The required CRDs are below: + +* `clusters.cluster.redpanda.com` - Manages Redpanda cluster configuration +* `topics.cluster.redpanda.com` - Manages topic lifecycle +* `users.cluster.redpanda.com` - Manages SASL users +* `schemas.cluster.redpanda.com` - Manages Schema Registry schemas + +If any CRDs are missing or incompatible with your Operator version, the Operator will fail to reconcile resources. + +Verify all required CRDs are installed: + +.Input +[,bash] +---- +kubectl get crd | grep redpanda.com +---- + +.Output +[,bash,role=no-copy] +---- +clusters.cluster.redpanda.com +topics.cluster.redpanda.com +users.cluster.redpanda.com +schemas.cluster.redpanda.com +---- + +=== Run Redpanda tuners + +Check that you have configured tuners for optimal performance. Tuners can significantly impact latency and throughput. In Kubernetes, tuners are configured through the Helm chart or may need to be run on worker nodes themselves. For details, see xref:deploy:redpanda/kubernetes/k-tune-workers.adoc[Tune Kubernetes Worker Nodes for Production]. + +== Recommended requirements + +The Recommended requirements checklist ensures that you can monitor and support your environment on a sustained basis. It includes the following checks: + +- You have adhered to day-2 operations best practices. +- You can diagnose and recover from backup issues or failures. +- You have configured monitoring, backup, and security scanning. + +=== Deployment method + +Verify that the deployment method (Helm or Operator) is correctly identified for your cluster. Understanding your deployment method is important for troubleshooting, upgrades, and configuration management. + +[tabs] +====== +Helm:: ++ +-- +.Input +[,bash] +---- +helm list -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +redpanda redpanda 1 2024-01-15 10:30:00.123456 -0800 PST deployed redpanda-5.0.0 v24.1.1 +---- + +The presence of a Helm release (`CHART` displays `redpanda-5.0.0`) indicates a Helm-managed deployment. +-- + +Operator:: ++ +-- +.Input +[,bash] +---- +kubectl get redpanda -n +---- + +.Output +[,bash,role=no-copy] +---- +NAME READY STATUS +redpanda True Redpanda reconciliation succeeded +---- + +The presence of a Redpanda custom resource indicates an Operator-managed deployment. +-- +====== + +Knowing your deployment method helps determine which configuration approach to use (Helm values vs. Redpanda CR), how to perform upgrades and rollbacks, where to find deployment logs and troubleshooting information, and which documentation sections apply to your environment. See xref:deploy:redpanda/kubernetes/k-production-workflow.adoc[Production Deployment Workflow] for the complete deployment process. + +=== XFS filesystem + +Verify that data directories use XFS filesystem for optimal performance. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- df -khT /var/lib/redpanda/data +---- + +.Output +[,bash,role=no-copy] +---- +Filesystem Type Size Used Avail Use% Mounted on +/dev/nvme0n1 xfs 1.8T 14G 1.8T 1% /var/lib/redpanda/data +---- +-- + +XFS provides better performance characteristics for Redpanda workloads compared to ext4. While ext4 is supported, XFS is strongly recommended for production deployments. + +See also: xref:deploy:redpanda/manual/production/requirements.adoc#storage[Storage Requirements] + +=== Pod anti-affinity + +Configure Pod anti-affinity to spread brokers across nodes. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl get statefulset redpanda -n -o jsonpath='{.spec.template.spec.affinity}' | jq +---- + +.Output +[,bash,role=no-copy] +---- +{ + "podAntiAffinity": { + "requiredDuringSchedulingIgnoredDuringExecution": [ + { + "labelSelector": { + "matchLabels": { + "app.kubernetes.io/name": "redpanda" + } + }, + "topologyKey": "kubernetes.io/hostname" + } + ] + } +} +---- +-- + +This prevents single node failures from affecting multiple brokers by ensuring each Redpanda Pod runs on a different node. + +See also: xref:deploy:redpanda/kubernetes/k-production-deployment.adoc#affinity-rules[Pod Anti-Affinity] + +=== Node isolation + +Configure taints/tolerations or nodeSelector for workload isolation. + +.Input +[,bash] +---- +kubectl get statefulset redpanda -n -o jsonpath='{.spec.template.spec.nodeSelector}' | jq +---- + +.Output +[,bash,role=no-copy] +---- +{ + "workload-type": "redpanda" +} +---- + +Isolating Redpanda workloads on dedicated nodes improves performance predictability by preventing resource contention with other applications. + +=== Partition balancing + +Configure automatic partition balancing across brokers and CPU cores. + +==== Continuous Data Balancing + +xref:manage:cluster-maintenance/continuous-data-balancing.adoc[Continuous Data Balancing] can help you manage production deployments by automatically rebalancing partition replicas across brokers based on disk usage and node changes. It also eliminates manual intervention and prevents performance degradation. + +IMPORTANT: You should enable Continuous Data Balancing for all licensed production clusters. + +Verify that Continuous Data Balancing is configured: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get partition_autobalancing_mode -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +continuous +---- + +The `continuous` setting enables automatic partition rebalancing based on: + +* Node additions or removals +* High disk usage conditions +* Broker availability changes + +Without Continuous Data Balancing, partition distribution becomes skewed over time, leading to hotspots and manual rebalancing operations. + +==== Core Balancing + +xref:manage:cluster-maintenance/cluster-balancing.adoc#intra-broker-partition-balancing[Intra-broker partition balancing] distributes partition replicas across CPU cores within individual brokers. + +Check core balancing for CPU core partition distribution: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get core_balancing_on_core_count_change -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +true +---- + +When enabled, Redpanda continuously rebalances partitions between CPU cores on a broker for optimal resource utilization, which is especially beneficial after broker restarts or configuration changes. + +=== System requirements + +Run system checks to get more details regarding your system configuration. + +[.side-by-side] +-- +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk redpanda check -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +CONDITION REQUIRED CURRENT SEVERITY PASSED +Data directory is writable true true Fatal true +Free memory per CPU [MB] >= 2048 8192 Warning true +NTP Synced true true Warning true +Swappiness 1 1 Warning true +---- +-- + +Review any failed checks and remediate before proceeding to production. See xref:reference:rpk/rpk-redpanda/rpk-redpanda-check.adoc[rpk redpanda check] for details on each validation. + +=== Debug bundle + +Verify that you can successfully generate and collect a debug bundle from your cluster. This proactive check ensures that if an issue occurs and you need to contact Redpanda support, you won't face permission issues or silent collection failures that could delay troubleshooting. + +Generate a debug bundle: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk debug bundle -o /tmp/bundle.zip +---- + +For additional options and arguments, see xref:reference:rpk/rpk-debug/rpk-debug-bundle.adoc[rpk debug bundle]. + +.Output +[,bash,role=no-copy] +---- +Creating bundle file... +Collecting cluster info... +Collecting logs... +Collecting configuration... +Debug bundle saved to '/tmp/bundle.zip' +---- + +Debug bundles collect critical diagnostic information including cluster configuration and metadata, Redpanda logs from all brokers, system resource usage and performance metrics, and Kubernetes resource definitions. + +When testing bundle generation, watch for permission errors preventing log collection, insufficient disk space for bundle creation, network policies blocking bundle transfer, or RBAC restrictions on accessing Pod logs or exec. Testing bundle generation early ensures this critical troubleshooting tool works when you need it most. Debug bundles are often required by Redpanda support to diagnose production issues efficiently. + +See also: xref:manage:kubernetes/troubleshooting/k-diagnostics-bundle.adoc[Diagnostics Bundles in Kubernetes] + +=== Tiered Storage + +Configure xref:manage:kubernetes/tiered-storage/k-tiered-storage.adoc[Tiered Storage] for extended data retention using object storage. Tiered Storage automatically offloads older data to cloud storage (S3, GCS, Azure Blob), enabling extended retention without expanding local disk capacity. + +Verify Tiered Storage configuration: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get cloud_storage_enabled -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +true +---- + +==== Benefits of Tiered Storage + +* Reduced local storage costs from offloading cold data to cheaper object storage +* Longer data retention periods without provisioning additional disk +* Required for advanced features like Remote Read Replicas and Iceberg integration +* Disaster recovery capabilities through cloud-backed data + +To verify your Tiered Storage configuration: + +.Input +[,bash] +---- +# Check bucket configuration +kubectl exec -n -c redpanda -- rpk cluster config get cloud_storage_bucket -X user= -X pass= -X sasl.mechanism= + +# Check region/endpoint +kubectl exec -n -c redpanda -- rpk cluster config get cloud_storage_region -X user= -X pass= -X sasl.mechanism= +---- + +See also: xref:manage:kubernetes/tiered-storage/k-tiered-storage.adoc[Tiered Storage] + +=== Security scanning + +Regularly scan container images and configurations for vulnerabilities to maintain security. + +==== Container image scanning + +Verify that container images are scanned before deployment: + +.Input +[,bash] +---- +# Check current image in use +kubectl get statefulset redpanda -n -o jsonpath='{.spec.template.spec.containers[?(@.name=="redpanda")].image}' +---- + +.Output +[,bash,role=no-copy] +---- +docker.redpanda.com/redpandadata/redpanda:v24.2.4 +---- + +==== Security scanning best practices + +Security scanning best practices include: + +* Scan images using tools like Trivy, Snyk, or cloud-native scanners before deployment +* Set up automated scanning in CI/CD pipelines +* Monitor for CVE announcements and security advisories +* Keep Redpanda and related components up-to-date with security patches (see xref:upgrade:k-rolling-upgrade.adoc[Rolling Upgrades]) +* Review Kubernetes RBAC policies and ServiceAccount permissions (see xref:manage:kubernetes/security/authorization/k-role-controller.adoc[Role Controller]) + +==== Configuration scanning + +.Input +[,bash] +---- +# Scan Kubernetes manifests +kubectl get redpanda,statefulset,deployment -n -o yaml > cluster-config.yaml +# Use kubesec, kube-bench, or similar tools to scan cluster-config.yaml +---- + +Establish a regular cadence for security scanning (for example, weekly or with each deployment). + +=== Backup and recovery + +Implement and test backup and recovery processes to ensure business continuity. + +==== Backup strategy with Tiered Storage + +Tiered Storage provides built-in backup capabilities by storing data in object storage. Verify Tiered Storage is configured: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get cloud_storage_enabled -X user= -X pass= -X sasl.mechanism= +---- + +==== Recovery testing + +Regularly test recovery procedures to validate RTO/RPO targets: + +.Input +[,bash] +---- +# Test topic restoration from Tiered Storage +kubectl exec -n -c redpanda -- rpk topic describe -X user= -X pass= -X sasl.mechanism= +---- + +For mission-critical workloads requiring active disaster recovery, consider implementing xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Shadowing] to asynchronously replicate data to a standby cluster. Shadowing provides offset-preserving replication that maintains consumer positions, enabling faster recovery with lower RTO compared to restoration from backups. This Enterprise feature (available in Redpanda v25.3 or later) supports cross-region or cross-cloud disaster recovery with automatic failover capabilities. + +Configure and validate Tiered Storage for automatic data backup to object storage. Document and regularly test recovery procedures for different failure scenarios in non-production environments. Establish clear Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets, and maintain runbooks for disaster recovery scenarios. For Shadowing deployments, use the xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Shadowing Failover Runbook] as a starting point. Verify that IAM roles and permissions for object storage access are correctly configured and tested. + +See also: + +* xref:manage:kubernetes/tiered-storage/k-whole-cluster-restore.adoc[Whole Cluster Restore] +* xref:manage:kubernetes/shadowing/k-shadow-linking.adoc[Configure Shadowing] +* xref:manage:kubernetes/shadowing/k-failover-runbook.adoc[Shadowing Failover Runbook] + +=== Audit logging + +Enable and configure audit logging for compliance and security monitoring requirements. + +Verify your audit log configuration: + +.Input +[,bash] +---- +kubectl exec -n -c redpanda -- rpk cluster config get audit_enabled -X user= -X pass= -X sasl.mechanism= +---- + +.Output +[,bash,role=no-copy] +---- +true +---- + +Check to ensure you know where audit logs are being written: + +.Input +[,bash] +---- +# Check audit log topic +kubectl exec -n -c redpanda -- rpk topic list -X user= -X pass= -X sasl.mechanism= | grep audit +---- + +.Output +[,bash,role=no-copy] +---- +_redpanda.audit_log 1 3 +---- + +The output values of `1` and `3` indicate the number of partitions and replicas, respectively, for the audit log topic. + +For production environments with compliance requirements (SOC 2, HIPAA, PCI DSS, GDPR), forward audit logs to your SIEM system and configure retention policies according to your regulatory obligations. Ensure the audit log topic has adequate replication and retention settings. + +See also: xref:manage:kubernetes/security/k-audit-logging.adoc[Audit Logging] + +=== Monitoring + +Check that xref:manage:kubernetes/monitoring/k-monitor-redpanda.adoc[monitoring] is configured with xref:manage:kubernetes/monitoring/k-monitor-redpanda.adoc#configure-prometheus[Prometheus] and xref:manage:kubernetes/monitoring/k-monitor-redpanda.adoc#generate-grafana-dashboard[Grafana] to scrape metrics from all Redpanda brokers. + +Verify ServiceMonitor is configured: + +.Input +[,bash] +---- +kubectl get servicemonitor -n +---- + +=== System log retention + +Check that Redpanda logs are being captured and stored for an appropriate period of time (minimally, seven days). Configure log forwarding using tools like Fluentd or your cloud provider's logging solution to send logs to a central location for troubleshooting and compliance purposes. + +See also: xref:manage:kubernetes/troubleshooting/k-diagnostics-bundle.adoc[Diagnostics Bundles in Kubernetes] + +=== Environment configuration + +Check that you have a development or test environment configured to evaluate upgrades and configuration changes before applying them to production. + +=== Upgrade policy + +Check that you have an upgrade policy defined and implemented. Redpanda supports xref:upgrade:k-rolling-upgrade.adoc[rolling upgrades], so upgrades do not require downtime. However, make sure that upgrades are scheduled on a regular basis, ideally using automation with xref:manage:kubernetes/k-configure-helm-chart.adoc[Helm] or GitOps workflows. + +== Advanced requirements + +The Advanced requirements checklist ensures full enterprise readiness, indicates that your system is operating at the highest level of availability, and can prevent or recover from the most serious incidents. The Advanced requirements checklist confirms the following: + +- You are proactively monitoring mission-critical workloads. +- You have business continuity solutions in place. +- You have integrated into enterprise security and operational systems. +- Your enterprise is ready to run mission-critical workloads. + +=== Configure alerts + +A standard set of alerts for xref:manage:kubernetes/monitoring/k-monitor-redpanda.adoc#generate-grafana-dashboard[Grafana] or xref:manage:kubernetes/monitoring/k-monitor-redpanda.adoc#configure-prometheus[Prometheus] is provided in the https://github.com/redpanda-data/observability[GitHub Redpanda observability repo^]. Customize these alerts for your specific needs. + +See also: xref:reference:monitor-metrics.adoc[Monitoring Metrics] + +=== Deployment automation + +Review your deployment automation. Ensure that cluster configuration is managed using xref:manage:kubernetes/k-configure-helm-chart.adoc[Helm] or GitOps workflows, and that all configuration is saved in source control. + +=== Monitor security settings + +Regularly review your cluster's security settings using the `/v1/security/report` link:/api/doc/admin/[Admin API] endpoint. Investigate and address any issues identified in the alerts section. + +include::manage:partial$security-report.adoc[] + +== Suggested reading + +- xref:deploy:redpanda/kubernetes/k-production-deployment.adoc[Deploy for Production] +- xref:manage:kubernetes/k-configure-helm-chart.adoc[Customize the Helm Chart] diff --git a/modules/deploy/pages/redpanda/kubernetes/k-production-workflow.adoc b/modules/deploy/pages/redpanda/kubernetes/k-production-workflow.adoc index 814f4eaacd..e7258c1a08 100644 --- a/modules/deploy/pages/redpanda/kubernetes/k-production-workflow.adoc +++ b/modules/deploy/pages/redpanda/kubernetes/k-production-workflow.adoc @@ -10,3 +10,4 @@ The production deployment tasks involve Kubernetes administrators (admins) as we . All: xref:deploy:redpanda/kubernetes/k-requirements.adoc[Review the requirements and recommendations] to align on prerequisites. . Admin: xref:deploy:redpanda/kubernetes/k-tune-workers.adoc[Tune the worker nodes] for best performance. . User: xref:deploy:redpanda/kubernetes/k-production-deployment.adoc[Deploy Redpanda] using either the Redpanda Operator or the Redpanda Helm chart. +. All: xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[Validate production readiness] using the comprehensive checklist to ensure your deployment meets production standards. diff --git a/modules/deploy/pages/redpanda/kubernetes/k-requirements.adoc b/modules/deploy/pages/redpanda/kubernetes/k-requirements.adoc index e287ed8d02..0f273af4d6 100644 --- a/modules/deploy/pages/redpanda/kubernetes/k-requirements.adoc +++ b/modules/deploy/pages/redpanda/kubernetes/k-requirements.adoc @@ -11,7 +11,10 @@ include::deploy:partial$requirements.adoc[] == Next steps -xref:deploy:redpanda/kubernetes/k-production-deployment.adoc[]. +After meeting these requirements, proceed to: + +- xref:deploy:redpanda/kubernetes/k-production-deployment.adoc[Deploy Redpanda for production] +- xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[Validate production readiness] with the comprehensive checklist include::shared:partial$suggested-reading.adoc[] diff --git a/modules/deploy/pages/redpanda/manual/production/production-readiness.adoc b/modules/deploy/pages/redpanda/manual/production/production-readiness.adoc index 52da709883..552f47182f 100644 --- a/modules/deploy/pages/redpanda/manual/production/production-readiness.adoc +++ b/modules/deploy/pages/redpanda/manual/production/production-readiness.adoc @@ -1,11 +1,13 @@ = Production Readiness Checklist :page-aliases: deploy:deployment-option/self-hosted/manual/production/production-readiness.adoc -Before running a production workload on Redpanda, follow this readiness checklist to ensure that you're set up for success. Redpanda Data recommends using the xref:deploy:redpanda/manual/production/production-deployment-automation.adoc[automated deployment instructions] with Ansible. If you cannot deploy with Ansible, use the xref:deploy:redpanda/manual/production/production-deployment.adoc[manual deployment instructions]. +Before running a production workload on Redpanda, follow this readiness checklist. Redpanda Data recommends using the xref:deploy:redpanda/manual/production/production-deployment-automation.adoc[automated deployment instructions] with Ansible. If you cannot deploy with Ansible, use the xref:deploy:redpanda/manual/production/production-deployment.adoc[manual deployment instructions]. -== Level 1 production readiness +NOTE: For Kubernetes deployments, see the xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[Production Readiness Checklist for Kubernetes]. -The Level 1 readiness checklist helps you to confirm that: +== Critical requirements + +The Critical requirements checklist helps you to confirm that: - All required defaults and configuration items are specified. - You have the optimal hardware setup. @@ -550,9 +552,9 @@ NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION See also: xref:manage:cluster-maintenance/decommission-brokers.adoc[Decommission Brokers] -== Level 2 production readiness +== Recommended requirements -The Level 2 readiness checklist confirms that you can monitor and support your environment on a sustained basis. It includes the following checks: +The Recommended requirements checklist confirms that you can monitor and support your environment on a sustained basis. It includes the following checks: - You have adhered to day-2 operations best practices. - You can diagnose and recover from issues or failures. @@ -624,9 +626,9 @@ See also: * xref:deploy:redpanda/manual/high-availability.adoc#multi-az-deployments[Multi-AZ deployments] * xref:manage:kubernetes/k-rack-awareness.adoc#configure-rack-awareness[Configure rack awareness in Kubernetes] -== Level 3 production readiness +== Advanced requirements -The Level 3 readiness checklist ensures full enterprise readiness. This indicates that your system is operating at the highest level of availability and can prevent or recover from the most serious incidents. The Level 3 readiness confirms the following: +The Advanced requirements checklist ensures full enterprise readiness, indicates that your system is operating at the highest level of availability, and can prevent or recover from the most serious incidents. The advanced requirements confirm the following: - You are proactively monitoring mission-critical workloads, business continuity solutions, and integration into enterprise security systems. - Your enterprise is ready to run mission-critical workloads. diff --git a/modules/deploy/partials/high-availability.adoc b/modules/deploy/partials/high-availability.adoc index 920a3200ef..5801bf4f76 100644 --- a/modules/deploy/partials/high-availability.adoc +++ b/modules/deploy/partials/high-availability.adoc @@ -531,6 +531,10 @@ cat debug.log | grep -v ApiVersions | egrep 'opening|read' include::shared:partial$suggested-reading.adoc[] +ifdef::env-kubernetes[] +* xref:deploy:redpanda/kubernetes/k-production-readiness.adoc[Production readiness checklist] - Validate your Kubernetes deployment against production standards +endif::[] + * https://redpanda.com/blog/redpanda-official-jepsen-report-and-analysis?utm_assettype=report&utm_assetname=roi_report&utm_source=gated_content&utm_medium=content&utm_campaign=jepsen_blog[Redpanda's official Jepsen report^] * https://redpanda.com/blog/simplifying-raft-replication-in-redpanda[Simplifying Redpanda Raft implementation^] * https://redpanda.com/blog/kafka-redpanda-availability[An availability footprint of the Redpanda and Apache Kafka replication protocols^]