ABN-382/feat: Redis sidecar for cache, page_cache, and session by dgjlindsay · Pull Request #11 · brtkwr/magento-helm

dgjlindsay · 2026-05-04T12:49:24Z

Why

Today on magento.staging.achterafbetalen.co/admin we hit:

Fatal error: Uncaught Exception: Warning: SessionHandler::write(): Write failed: No space left on device (28)

/var/www/html/var PVC (9.8G) was 100% full. Of which var/page_cache/ alone was 9.0G of file-backed Magento full-page-cache entries.

Root cause: this chart never sets cache.frontend.default, cache.frontend.page_cache, or session backends in app/etc/env.php. Magento defaults all three to filesystem, and the file backend has no eviction policy — entries accumulate until the PVC fills, sessions start failing, admin returns 500s. Affects every release of the chart (both magento and magento-abn deployments hit this).

What

Redis as a native sidecar in the magento pod — matches the existing DB / OpenSearch sidecar pattern (restartPolicy: Always init container, K8s 1.28+). Localhost-only, no Service, no persistence. Sized small: 256MB maxmemory with allkeys-lru eviction.
init-setup.sh wiring via bin/magento setup:config:set. Idempotent, runs on both upgrade and fresh-install paths, before the setup:upgrade that bootstraps the cache layer.
Three Redis DBs split between cache.default / cache.page_cache / session — flushing one doesn't blow the others away.
Bash /dev/tcp readiness probe rather than redis-cli (not in the magento image).
Gated by redis.enabled (default true). When false, chart renders the prior filesystem behaviour — escape hatch for anyone deploying outside K8s 1.28+ or wanting the old shape.

Verification

helm lint charts/magento — clean
helm template ... --set redis.enabled=true — sidecar + setup:config:set lines present
helm template ... --set redis.enabled=false — no Redis container, no configure_redis_backends calls

Once merged, first pod restart re-initialises env.php with Redis backends; var/page_cache stays empty and the PVC stops filling.

Context

Linear: ABN-382
Workaround applied today: find /var/www/html/var/page_cache -mindepth 1 -delete on the live pod (PVC back to 8% used, admin reachable). This recurs without the permanent fix in this PR.

Out of scope

Production Redis topology (managed Memorystore, sentinel, persistence)
Backfilling existing PVCs

🤖 Generated with Claude Code

Without this the chart leaves env.php cache backends unset, so Magento defaults to filesystem with no eviction policy. var/page_cache then fills the PVC over time and SessionHandler::write() starts failing — admin 500s, recently observed on staging. Wires Redis as a native sidecar (matching the DB / OpenSearch pattern, K8s 1.28+ restartPolicy: Always init container) and configures env.php in init-setup.sh via setup:config:set, on both the upgrade and the fresh-install paths. Localhost-only, no Service, ephemeral. Cache hot-set capped at 256MB with allkeys-lru eviction. Three Redis DBs split between cache.default, cache.page_cache, and session so flushing one doesn't blow the others. Gated by redis.enabled (default true) — chart still renders the prior filesystem-backed behaviour when disabled. ABN-382 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

brtkwr

review notes — not blockers, mostly polish + one operational gotcha worth knowing about

brtkwr · 2026-05-07T11:06:58Z

+      # Wire Redis backends into the freshly-written env.php before any
+      # subsequent mage command (setup:upgrade, di:compile, config:set)
+      # bootstraps the cache layer.
+      configure_redis_backends


inconsistent error handling between paths — the upgrade branch above (line 169) wraps this in || { restore env.php.bak; exit 1 }, but the fresh-install path here is a bare call. if setup:config:set ever errors mid-install (malformed values, redis genuinely unreachable past the wait loop), you get a half-configured env.php with no fallback. either both should fail-fast cleanly or both should restore

brtkwr · 2026-05-07T11:06:58Z

+      # Wire Redis backends into the freshly-written env.php before any
+      # subsequent mage command (setup:upgrade, di:compile, config:set)
+      # bootstraps the cache layer.
+      configure_redis_backends


minor: on the fresh-install path setup:install runs before this with filesystem-backed cache (since env.php was just written without redis backends), then we reconfigure to redis. anything setup:install populates in cache is stranded on the unused filesystem path. probably zero impact in practice but a cache:flush after configure_redis_backends would be cleaner. better still: write env.php with redis backends from the start so install itself uses redis

brtkwr · 2026-05-07T11:06:58Z

+        --session-save-redis-host={{ .Values.redis.host }} \
+        --session-save-redis-port={{ .Values.redis.port }} \
+        --session-save-redis-db={{ .Values.redis.databases.session }} \
+        --session-save-redis-log-level=3


hardcoded — worth lifting to redis.session.logLevel in values for consistency with the rest of the redis config block

brtkwr · 2026-05-07T11:06:58Z

+        # Backs cache.default, cache.page_cache, and session — wired in
+        # init-setup.sh via setup:config:set. No persistence, no Service:
+        # cache is ephemeral and only reachable from this pod's localhost.
+        - name: redis


sidecar ordering — native sidecars start in declaration order, and each subsequent regular initContainer waits for prior sidecars to be ready. redis is declared after opensearch (which is also a sidecar), so redis startup is gated behind opensearch becoming ready. they're independent — moving redis above opensearch (or at least not after it) avoids redis being delayed when opensearch is slow

brtkwr · 2026-05-07T11:06:58Z

+  enabled: true
+  image: redis
+  tag: "7.4-alpine"
+  host: 127.0.0.1


having host as a value implies external redis is supported, but there's no password or tls knob — so anyone pointing this at managed redis (memorystore, elasticache) has no way to authenticate. either lock the chart to localhost-only (drop host, hardcode 127.0.0.1 in the template) or add redis.password / redis.tls.enabled for honesty. happy with localhost-only as the design — just make the constraint explicit

brtkwr · 2026-05-07T11:06:58Z

+  port: 6379
+  # Eviction. allkeys-lru drops the least-recently-used keys when memory is
+  # exhausted. Avoids a wedged pod if the cache hot set ever exceeds maxmemory.
+  maxmemory: "256mb"


256mb is fine for dev/staging but will churn through LRU evictions on a real storefront — FPC entries pile up fast (one per url × customer-group × currency × store-view). worth either bumping the default or noting in the comment that prod workloads should size up. small deployments can always override down

brtkwr · 2026-05-07T11:06:58Z


+# Redis sidecar — backs cache.default, cache.page_cache, and session.
+# Without this, Magento defaults to filesystem backends with no eviction
+# policy, and var/page_cache fills the PVC over time → SessionHandler


operational gotcha worth flagging in the PR description (or as a one-time init step): existing deployments switching redis.enabled from false → true keep all the old var/page_cache/, var/cache/, var/session/ files lying around forever — magento just stops writing there. the PVC pollution that motivated this PR persists until someone manually cleans up. could be a one-shot rm -rf var/page_cache/* var/cache/* var/session/* in init-setup gated on first redis-enabled boot, or just a documented runbook step

brtkwr · 2026-05-07T11:07:27Z

+            - {{ .Values.redis.maxmemoryPolicy | quote }}
+            - --save
+            - ""
+            - --appendonly


worth making persistence configurable? agreed that ephemeral is the right default — adding a PVC by default would just reintroduce the unbounded-growth failure mode on a different volume, and cache.default / cache.page_cache are regenerable so losing them on pod restart is a non-event.

the one case that hurts is session — every pod restart logs everyone out, including admins mid-task. not a regression vs the pre-PR filesystem behaviour (sessions were already on the same PVC that died), but if you ever wanted session continuity across restarts, a redis.persistence.enabled knob (defaulting false) would let users opt in with a small PVC sized just for AOF. they'd still own the sizing risk, but at least the option exists without forking the chart.

fine to defer to a follow-up PR — just flagging that the current shape leaves no escape hatch short of pointing redis.host at external redis (which itself isn't fully wired, see the password/tls comment above)

brtkwr reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABN-382/feat: Redis sidecar for cache, page_cache, and session#11

ABN-382/feat: Redis sidecar for cache, page_cache, and session#11
dgjlindsay wants to merge 1 commit into
brtkwr:mainfrom
dgjlindsay:doug/ABN-382-redis-cache-backend

dgjlindsay commented May 4, 2026

Uh oh!

brtkwr left a comment •

edited

Loading

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

brtkwr May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgjlindsay commented May 4, 2026

Why

What

Verification

Context

Out of scope

Uh oh!

brtkwr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brtkwr left a comment •

edited

Loading