Skip to content

Introduce standard storage alternative#895

Merged
matheuscscp merged 1 commit into
mainfrom
flux-storage
Jun 15, 2026
Merged

Introduce standard storage alternative#895
matheuscscp merged 1 commit into
mainfrom
flux-storage

Conversation

@matheuscscp

@matheuscscp matheuscscp commented Jun 14, 2026

Copy link
Copy Markdown
Member

Closes: #333

Introduce feature gate FluxStorage. When enabled, the controller uses fluxcd/pkg/artifact/storage for storing image tags instead of BadgerDB. The gate is disabled by default.

⚠️ Whenever enabling or disabling this feature gate, the storage is completely wiped.

I wrote a prompt for Opus 4.8 xhigh to write a detailed plan, and handed the plan to GPT 5.5 xhigh for implementation.

Prompt (click to see):

Details
lets add an alternative storage implementation using fluxcd/pkg/artifact/storage like
source-controller and source-watcher. for opting into this feature users must set the
new feature gate `FluxStorage` to `true`. the default is `false`, indicating that
badgerDB should be the storage implementation.

we need to wipe the storage if the feature gate state does not match the current state
of the disk. to define this relationship, lets define a file `.storage-version` in the
root of the storage directory. the absence of this file indicates that the storage is
in the badgerDB format. the existence of this file with content `2` indicates that the
storage is in the new format. so the feature gate disabled matches the absence of the
file, and the feature gate enabled matches the existence of the file with content `2`.

whenever the feature gate and disk state do not match, we need to wipe the disk.
also, if the controller process is terminated while wiping the disk, the next time
the controller starts it should finish wiping the disk i.e. it cannot start with a
partially wiped disk as it could be corrupted. so before wiping the disk, we need
to create a file `.storage-wipe-in-progress` (no content) in the root of the storage
directory. if this file exists on startup, we need to wipe the disk and remove the
file before proceeding. the `.storage-wipe-in-progress` file should be created before
the "wipe-in-progress" check whenever the feature gate and disk state do not match.
so the startup should go like this (before the storage implementation is initialized):

1. determine if the feature gate matches the disk state.
2. if they do not match, ensure the file `.storage-wipe-in-progress` exists and:
    * create `.storage-version` file with content `2` if the feature gate is enabled, or
    * remove the `.storage-version` file if the feature gate is disabled.
3. if the `.storage-wipe-in-progress` file exists (regardless if it was created now or
   before), wipe the disk and remove the file.
4. proceed with the rest of the startup.

this new implementation brings a new, simpler storage model where tags are stored
by ImageRepository {namespace}/{name} and not by the canonical image reference.
this is more respectful towards tenant isolation in Kubernetes. ImageRepository
holds security credentials to discover tags, and sharing this content between
tenants is an optimization that bypasses multi-tenancy security boundaries.

we do not need URLs for the files in the storage implementation, so we don't need
the HTTP server provided by fluxcd/pkg/artifact. we need to use the storage package
in a minimal way for the needs of this controller. if we identify that the fluxcd/pkg
libraries needs improvements to support this use case, we will contribute back to the
fluxcd/pkg repository.

the serialization format for the tags should be an efficient format optimized for the
content. we know we are only going to store a Go `[]string` slice where the strings are
simple OCI tags. we can simply emit one tag per line, which is better than JSON and allows
for streaming, i.e. at the end of every tag we emit a `\n` newline character.

we need a flag e.g. `--storage-compression-threshold` in KiB to control the threshold for
compressing the files (with .tar.gz, which I think we have some support for in fluxcd/pkg).
the implementation should detect if the file is compressed or not and decompress it on read.
this can be done by appending the `.tar.gz` extension to the file name when the file is
created with compression (i.e. it crossed the configured threshold at the time of creation).
if on creation the file does not cross the threshold, the extension be just `.txt`. if the
file crosses the threshold, the extension should be `.txt.tar.gz`. lets default the flag to
1 KiB. crash the controller if the flag is set to 0 or a negative value, or with the feature
gate disabled.

for garbage collection when the implementation is active, we need a goroutine whose interval
is defined by the existing `--gc-interval` flag. this goroutine must list all the ImageRepository
objects in disk, and all the ImageRepository objects that this instance of the controller should
be responsible for. this means respecting any options defining a subset of namespaces and/or
watch labels for sharding or anything like that. the objects that are not in the list of
ImageRepository objects that this controller should be responsible for should be deleted from
disk. as I understand it, the GC logic provided by fluxcd/pkg/artifact is not suitable for this
use case. we can just not opt-in for it and write our own GC logic as described here.

let's create a plan for this task and write it to plan.md.

@matheuscscp

matheuscscp commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

I used Opus 4.8 to perform the following manual tests in a kind cluster:

Test Result
Badger baseline (FluxStorage=false) ✅ 91 tags scanned, policy → 6.13.0; /data has Badger files, no .storage-version
Filesystem (FluxStorage=true) ✅ tags stored at imagerepository/<ns>/<name>/tags.txt[.gz], .storage-version=2
Migrate Badger→FS→Badger→FS ✅ each switch wipes the volume (sentinel + old files gone), rewrites version marker, rescans clean
--storage-compression-threshold=1 (1 KiB) ✅ gzip (tags.txt.gz, 909 B); stale tags.txt removed
--storage-compression-threshold=2 (2 KiB) ✅ plain (tags.txt, 1875 B); stale .gz removed
--storage-compression-threshold=0 + FS ✅ rejected → CrashLoopBackOff, value must be greater than zero
--gc-interval=0 ✅ logs "garbage collector is disabled"; orphan survives
--gc-interval=1 ✅ orphan repo deleted, live repos kept
GC prunes empty namespace dirs (fix) ✅ empty <ns>/ dirs removed after last repo pruned; namespaces with live repos kept
Object deletion (reconcileDelete) ✅ tag dir removed immediately, not via periodic GC
Multiple repos, no sharding ✅ 4 repos (91/121/180/245 tags) all resolve, each isolated on disk
Sharding (2 controllers, separate PVCs) ✅ each controller stores only its shard; all policies resolve; default GC deleted exactly the 2 reassigned repos (deleted_entries:2), shard1 GC deleted 0

Comment thread docs/spec/v1/imagerepositories.md Outdated
`--storage-path` (`/data` by default). BadgerDB is the default backend, and
`--storage-value-log-file-size` applies only to that backend.

When the `FluxStorage` feature gate is enabled, the controller uses a filesystem

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so nice! We need a benchmark to see how it does with 1K images and 10K tags vs BadgerDB in terms of CPU/IO/MEM (with a default of 64 KiB instead of 1 KiB to disable gzip). I would really like to turn this feature gate on by default in Flux 2.10 if the benchmark is favorable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more relevant benchmark for large setups would be 5K images with 2M tags in total, 60-80MiB (uncompressed on disk).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BadgerDB vs Flux storage — benchmark

Single-node kind, one controller (--concurrent=10, threshold 64 KiB). Unique tags per repo.
Each ImageRepository has 1 ImagePolicy, so write path (scan→SetTags) and read path (Tags) both run.
Metrics from the controller process: CPU process_cpu_seconds_total, RSS process_resident_memory_bytes,
disk du /data, IO /proc/1/io.

Throughput (one full scan pass)

tags (repos×tags) backend CPU s peak RSS disk disk write disk read syscall write syscall read wall
10K (1K×10) badger 23 199 MiB <1 MiB 0 MiB 0 17 MiB 48 MiB 9s
10K (1K×10) flux 29 111 MiB 7 MiB 3 MiB 0 18 MiB 50 MiB 8s
2M (5K×400) badger 155 366 MiB 37 MiB 101 MiB 0 89 MiB 316 MiB 19s
2M (5K×400) flux 172 210 MiB 97 MiB 78 MiB 0 159 MiB 412 MiB 18s
25M (5K×5K) badger 241 1441 MiB 418 MiB 1705 MiB 0 103 MiB 1134 MiB 32s
25M (5K×5K) flux 477 228 MiB 234 MiB 214 MiB 1 319 MiB 1425 MiB 43s

2M: 14 KiB files, no gzip. 25M: 170 KiB files, gzip kicks in (>64 KiB).

Idle (zero objects, 60s window)

state backend idle CPU (cores) GC/60s RSS heap in-use next-GC
empty DB badger 0.026 0 52 MiB 97 MiB 180 MiB
empty DB flux 0.002 0 49 MiB 9 MiB 11 MiB

Memory sweep — loaded+orphaned badger (2M tags, objects deleted, default GOGC, no GOMEMLIMIT)

mem limit (=request) idle CPU (cores) GC/60s RSS heap in-use restarts
8Gi 0.019 0 52 MiB 97 MiB 0
2Gi 0.017 0 53 MiB 97 MiB 0
1Gi 0.018 0 53 MiB 97 MiB 0
512Mi 0.018 0 53 MiB 97 MiB 0

Conclusions

  • Memory: flux wins big. 40% less at 2M, 6× less at 25M. Even empty, badger holds 10× the heap (arena + cache at open) — that's the Higher CPU usage without load #333 overhead.
  • Disk: badger smaller under 64 KiB, flux smaller over it (gzip). Small either way (<250 MiB at 25M).
  • CPU: about equal, except flux doubles when gzip turns on (huge tag sets only). 64 KiB default keeps gzip off for normal repos — right call.
  • Badger writes way more to disk (LSM compaction): 8× at 25M.
  • "Bump memory fixes CPU" (Higher CPU usage without load #333): not at idle. Idle badger does 0 GC, so memory changes nothing (IRC sets no GOMEMLIMIT). Core-pinning needs active churn, not idle.
  • Ship FluxStorage on by default (2.10). Less memory, less disk-write, equal CPU normally, and it frees disk on delete (badger never does — Delete is a no-op).

Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
@stefanprodan stefanprodan added the enhancement New feature or request label Jun 15, 2026

@stefanprodan stefanprodan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @matheuscscp 🏅

@matheuscscp matheuscscp merged commit 088d3ba into main Jun 15, 2026
7 checks passed
@matheuscscp matheuscscp deleted the flux-storage branch June 15, 2026 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Higher CPU usage without load

2 participants