Problem
AdGuard Home on genmachine runs 3 replicas sharing a single NFS RWX PVC (pvc-adguard-data, nfs-csi-retain). Two of the three pods are permanently in CrashLoopBackOff:
[fatal] initializing auth module: creating session storage: timeout
session_storage: opening db filename=/opt/adguardhome/work/data/sessions.db err=timeout
Root cause
All 3 pods mount the same PVC at identical subPaths:
| Mount |
SubPath |
Purpose |
/opt/adguardhome/work |
work |
Runtime state — sessions.db, query log, stats |
/opt/adguardhome/conf |
conf |
Config file (AdGuardHome.yaml) |
sessions.db uses bbolt (BoltDB), which acquires an exclusive OS file lock. Over NFS, concurrent flock() calls time out — only the first pod to acquire the lock survives. The remaining two crash on every restart.
Consequence for HA: externalTrafficPolicy: Local is correctly set (source IP preservation), but MetalLB cannot migrate the DNS VIP to a healthy pod because the other two are crashing. The cluster effectively has zero DNS failover despite 3 pods being declared.
kubectl get pods -n adguard
adguard-adguard-home-xxx 0/1 CrashLoopBackOff 4475 talos-1
adguard-adguard-home-xxx 0/1 Error 4480 talos-2
adguard-adguard-home-xxx 1/1 Running 0 talos-3 ← only survivor
Options Considered
Option A — Stateless replicas (emptyDir) ✅ Recommended
Set persistence.enabled: false in genmachine values. The rm3l chart falls back to an emptyDir per pod for the work directory. The bootstrapEnabled: true mechanism already writes the full config from the Helm values Secret into each pod at init time — so config is fully GitOps-driven.
- Each pod owns its own isolated
emptyDir → no locking
- 3 replicas all healthy simultaneously
externalTrafficPolicy: Local works as designed: one pod per node, MetalLB VIP failover works
- Trade-off: query log history and statistics reset on pod restart (acceptable for homelab DNS — the important state is the config, already in Git)
- No PVC at all → fully stateless, no storage dependency
- volsync backup becomes obsolete
Option B — StatefulSet with per-pod PVC + adguardhome-sync
Each pod gets its own PVC (volumeClaimTemplates). bakito/adguardhome-sync syncs config from pod-0 (primary) to replicas via the AdGuard Home admin API.
- Preserves per-pod query log history
- adguardhome-sync runs as a sidecar or separate Deployment
- Complexity: need to manage primary/replica concept, initial setup wizard on each replica, sync scheduling
- adguardhome-sync has no Helm chart and no Kubernetes-native support
- The rm3l chart has a
statefulset.yaml template (deploymentType: StatefulSet) but it is not paired with adguardhome-sync out of the box
Option C — DaemonSet with hostNetwork
One pod per node using hostNetwork: true, listening on the node's physical IP. Eliminates all shared storage entirely.
- Source IP preservation is native (no DNAT)
- No LoadBalancer service needed for DNS — clients point to node IPs directly
- Problem: ties AdGuard to specific node IPs; incompatible with the current MetalLB VIP (
192.168.1.200) used by both clusters
Option D — Single replica (give up on HA)
replicaCount: 1 with a PodDisruptionBudget. MetalLB VIP failover still works (the pod moves to a different node on failure).
- Simple but no simultaneous redundancy
- DNS outage during pod migration (a few seconds)
Proposed Fix (Option A)
In gitops/manifests/adguard/genmachine/genmachine-values.yaml:
adguard-home:
replicaCount: 3
# Disable PVC — each pod gets its own emptyDir for work + conf
persistence:
enabled: false
# bootstrapEnabled copies AdGuardHome.yaml from the Secret at init time
# if no config file exists yet — safe for emptyDir (always fresh at start)
bootstrapEnabled: true
# externalTrafficPolicy: Local preserved for source IP
services:
dns:
externalTrafficPolicy: Local
...
Remove / clean up:
gitops/manifests/adguard/genmachine/templates/pvc.yaml
gitops/manifests/adguard/genmachine/templates/volsync-backup.yaml (no PVC to back up)
The bootstrapConfig block already contains the complete configuration (DNS upstreams, rewrites, filters, users, etc.) — this becomes the single source of truth in Git.
Networking Context
LAN client (DNS query)
│
▼ UDP/TCP 53
MetalLB L2 VIP 192.168.1.200
│ externalTrafficPolicy: Local
▼
Node running pod (talos-1 / talos-2 / talos-3)
│ source IP preserved
▼
AdGuard Home pod (emptyDir, independent)
With 3 healthy pods (one per node) and externalTrafficPolicy: Local, MetalLB distributes the VIP across nodes. If one node fails, the VIP migrates within seconds. Source IP is preserved for query logging.
References
Problem
AdGuard Home on genmachine runs 3 replicas sharing a single NFS RWX PVC (
pvc-adguard-data,nfs-csi-retain). Two of the three pods are permanently inCrashLoopBackOff:Root cause
All 3 pods mount the same PVC at identical subPaths:
/opt/adguardhome/workwork/opt/adguardhome/confconfsessions.dbuses bbolt (BoltDB), which acquires an exclusive OS file lock. Over NFS, concurrentflock()calls time out — only the first pod to acquire the lock survives. The remaining two crash on every restart.Consequence for HA:
externalTrafficPolicy: Localis correctly set (source IP preservation), but MetalLB cannot migrate the DNS VIP to a healthy pod because the other two are crashing. The cluster effectively has zero DNS failover despite 3 pods being declared.Options Considered
Option A — Stateless replicas (emptyDir) ✅ Recommended
Set
persistence.enabled: falsein genmachine values. The rm3l chart falls back to an emptyDir per pod for the work directory. ThebootstrapEnabled: truemechanism already writes the full config from the Helm values Secret into each pod at init time — so config is fully GitOps-driven.emptyDir→ no lockingexternalTrafficPolicy: Localworks as designed: one pod per node, MetalLB VIP failover worksOption B — StatefulSet with per-pod PVC + adguardhome-sync
Each pod gets its own PVC (
volumeClaimTemplates).bakito/adguardhome-syncsyncs config from pod-0 (primary) to replicas via the AdGuard Home admin API.statefulset.yamltemplate (deploymentType: StatefulSet) but it is not paired with adguardhome-sync out of the boxOption C — DaemonSet with hostNetwork
One pod per node using
hostNetwork: true, listening on the node's physical IP. Eliminates all shared storage entirely.192.168.1.200) used by both clustersOption D — Single replica (give up on HA)
replicaCount: 1with a PodDisruptionBudget. MetalLB VIP failover still works (the pod moves to a different node on failure).Proposed Fix (Option A)
In
gitops/manifests/adguard/genmachine/genmachine-values.yaml:Remove / clean up:
gitops/manifests/adguard/genmachine/templates/pvc.yamlgitops/manifests/adguard/genmachine/templates/volsync-backup.yaml(no PVC to back up)The
bootstrapConfigblock already contains the complete configuration (DNS upstreams, rewrites, filters, users, etc.) — this becomes the single source of truth in Git.Networking Context
With 3 healthy pods (one per node) and
externalTrafficPolicy: Local, MetalLB distributes the VIP across nodes. If one node fails, the VIP migrates within seconds. Source IP is preserved for query logging.References
deploymentTypeandpersistence: https://helm-charts.rm3l.orgexternalTrafficPolicy: Local: https://metallb.io/usage/#layer-2