ABN-382/feat: Redis sidecar for cache, page_cache, and session#11
ABN-382/feat: Redis sidecar for cache, page_cache, and session#11dgjlindsay wants to merge 1 commit into
Conversation
Without this the chart leaves env.php cache backends unset, so Magento defaults to filesystem with no eviction policy. var/page_cache then fills the PVC over time and SessionHandler::write() starts failing — admin 500s, recently observed on staging. Wires Redis as a native sidecar (matching the DB / OpenSearch pattern, K8s 1.28+ restartPolicy: Always init container) and configures env.php in init-setup.sh via setup:config:set, on both the upgrade and the fresh-install paths. Localhost-only, no Service, ephemeral. Cache hot-set capped at 256MB with allkeys-lru eviction. Three Redis DBs split between cache.default, cache.page_cache, and session so flushing one doesn't blow the others. Gated by redis.enabled (default true) — chart still renders the prior filesystem-backed behaviour when disabled. ABN-382 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # Wire Redis backends into the freshly-written env.php before any | ||
| # subsequent mage command (setup:upgrade, di:compile, config:set) | ||
| # bootstraps the cache layer. | ||
| configure_redis_backends |
There was a problem hiding this comment.
inconsistent error handling between paths — the upgrade branch above (line 169) wraps this in || { restore env.php.bak; exit 1 }, but the fresh-install path here is a bare call. if setup:config:set ever errors mid-install (malformed values, redis genuinely unreachable past the wait loop), you get a half-configured env.php with no fallback. either both should fail-fast cleanly or both should restore
| # Wire Redis backends into the freshly-written env.php before any | ||
| # subsequent mage command (setup:upgrade, di:compile, config:set) | ||
| # bootstraps the cache layer. | ||
| configure_redis_backends |
There was a problem hiding this comment.
minor: on the fresh-install path setup:install runs before this with filesystem-backed cache (since env.php was just written without redis backends), then we reconfigure to redis. anything setup:install populates in cache is stranded on the unused filesystem path. probably zero impact in practice but a cache:flush after configure_redis_backends would be cleaner. better still: write env.php with redis backends from the start so install itself uses redis
| --session-save-redis-host={{ .Values.redis.host }} \ | ||
| --session-save-redis-port={{ .Values.redis.port }} \ | ||
| --session-save-redis-db={{ .Values.redis.databases.session }} \ | ||
| --session-save-redis-log-level=3 |
There was a problem hiding this comment.
hardcoded — worth lifting to redis.session.logLevel in values for consistency with the rest of the redis config block
| # Backs cache.default, cache.page_cache, and session — wired in | ||
| # init-setup.sh via setup:config:set. No persistence, no Service: | ||
| # cache is ephemeral and only reachable from this pod's localhost. | ||
| - name: redis |
There was a problem hiding this comment.
sidecar ordering — native sidecars start in declaration order, and each subsequent regular initContainer waits for prior sidecars to be ready. redis is declared after opensearch (which is also a sidecar), so redis startup is gated behind opensearch becoming ready. they're independent — moving redis above opensearch (or at least not after it) avoids redis being delayed when opensearch is slow
| enabled: true | ||
| image: redis | ||
| tag: "7.4-alpine" | ||
| host: 127.0.0.1 |
There was a problem hiding this comment.
having host as a value implies external redis is supported, but there's no password or tls knob — so anyone pointing this at managed redis (memorystore, elasticache) has no way to authenticate. either lock the chart to localhost-only (drop host, hardcode 127.0.0.1 in the template) or add redis.password / redis.tls.enabled for honesty. happy with localhost-only as the design — just make the constraint explicit
| port: 6379 | ||
| # Eviction. allkeys-lru drops the least-recently-used keys when memory is | ||
| # exhausted. Avoids a wedged pod if the cache hot set ever exceeds maxmemory. | ||
| maxmemory: "256mb" |
There was a problem hiding this comment.
256mb is fine for dev/staging but will churn through LRU evictions on a real storefront — FPC entries pile up fast (one per url × customer-group × currency × store-view). worth either bumping the default or noting in the comment that prod workloads should size up. small deployments can always override down
|
|
||
| # Redis sidecar — backs cache.default, cache.page_cache, and session. | ||
| # Without this, Magento defaults to filesystem backends with no eviction | ||
| # policy, and var/page_cache fills the PVC over time → SessionHandler |
There was a problem hiding this comment.
operational gotcha worth flagging in the PR description (or as a one-time init step): existing deployments switching redis.enabled from false → true keep all the old var/page_cache/, var/cache/, var/session/ files lying around forever — magento just stops writing there. the PVC pollution that motivated this PR persists until someone manually cleans up. could be a one-shot rm -rf var/page_cache/* var/cache/* var/session/* in init-setup gated on first redis-enabled boot, or just a documented runbook step
| - {{ .Values.redis.maxmemoryPolicy | quote }} | ||
| - --save | ||
| - "" | ||
| - --appendonly |
There was a problem hiding this comment.
worth making persistence configurable? agreed that ephemeral is the right default — adding a PVC by default would just reintroduce the unbounded-growth failure mode on a different volume, and cache.default / cache.page_cache are regenerable so losing them on pod restart is a non-event.
the one case that hurts is session — every pod restart logs everyone out, including admins mid-task. not a regression vs the pre-PR filesystem behaviour (sessions were already on the same PVC that died), but if you ever wanted session continuity across restarts, a redis.persistence.enabled knob (defaulting false) would let users opt in with a small PVC sized just for AOF. they'd still own the sizing risk, but at least the option exists without forking the chart.
fine to defer to a follow-up PR — just flagging that the current shape leaves no escape hatch short of pointing redis.host at external redis (which itself isn't fully wired, see the password/tls comment above)
Why
Today on
magento.staging.achterafbetalen.co/adminwe hit:/var/www/html/varPVC (9.8G) was 100% full. Of whichvar/page_cache/alone was 9.0G of file-backed Magento full-page-cache entries.Root cause: this chart never sets
cache.frontend.default,cache.frontend.page_cache, orsessionbackends inapp/etc/env.php. Magento defaults all three to filesystem, and the file backend has no eviction policy — entries accumulate until the PVC fills, sessions start failing, admin returns 500s. Affects every release of the chart (bothmagentoandmagento-abndeployments hit this).What
restartPolicy: Alwaysinit container, K8s 1.28+). Localhost-only, no Service, no persistence. Sized small: 256MB maxmemory withallkeys-lrueviction.init-setup.shwiring viabin/magento setup:config:set. Idempotent, runs on both upgrade and fresh-install paths, before thesetup:upgradethat bootstraps the cache layer.cache.default/cache.page_cache/session— flushing one doesn't blow the others away./dev/tcpreadiness probe rather thanredis-cli(not in the magento image).redis.enabled(defaulttrue). Whenfalse, chart renders the prior filesystem behaviour — escape hatch for anyone deploying outside K8s 1.28+ or wanting the old shape.Verification
helm lint charts/magento— cleanhelm template ... --set redis.enabled=true— sidecar +setup:config:setlines presenthelm template ... --set redis.enabled=false— no Redis container, noconfigure_redis_backendscallsOnce merged, first pod restart re-initialises
env.phpwith Redis backends;var/page_cachestays empty and the PVC stops filling.Context
find /var/www/html/var/page_cache -mindepth 1 -deleteon the live pod (PVC back to 8% used, admin reachable). This recurs without the permanent fix in this PR.Out of scope
🤖 Generated with Claude Code