BLA-4251 classify backend uploads with create-if-absent semantics#7
Merged
shreyas-blacksmith merged 4 commits intoJun 29, 2026
Merged
Conversation
Backend AC/CAS uploads now use a conditional PUT (If-None-Match: *) so only net-new objects count toward the storage footprint. Each upload emits exactly one terminal outcome: created (no error), already_exists (412, or 304 from older MinIO), or failed. Bytes are reported as SizeOnDisk (stored/compressed) everywhere, including the queue-full dropped path, so accounting shares units with the MinIO drift scan. Bumps minio-go to v7.0.72; v7.0.69 quoted the "*" wildcard and silently broke create-if-absent. Core.PutObject is single-shot, so every upload (regardless of size) carries the header - no multipart bypass.
go get updated go.mod/go.sum but not the gazelle-managed Bazel deps, so the Bazel build still resolved the broken minio-go v7.0.69. Ran //:gazelle-update-repos (gazelle update-repos -from_file=go.mod -to_macro=deps.bzl%go_dependencies -prune): bumps minio-go to v7.0.72 and prunes sha256-simd/gofuzz, which v7.0.72 no longer pulls in. Keeps the go and bazel builds on the same client. Stays on v7.0.72 (go 1.21) rather than latest v7.0.x: v7.0.9x requires go >= 1.23, which would force WORKSPACE go_register_toolchains (currently 1.22.0) and likely rules_go/gazelle bumps. Co-authored-by: Cursor <cursoragent@cursor.com>
classifyUploadOutcome returned "failed", but FA emits "error"
(buildCacheStatusError) and the web reader counts status IN
('error','rejected','dropped'). The divergent "failed" was silently
ignored by both consumers, so genuine upload failures never showed up in
failedOperationsCount. Emit "error" to match the shared taxonomy.
Co-authored-by: Cursor <cursoragent@cursor.com>
ajwerner
approved these changes
Jun 29, 2026
Comment on lines
+229
to
+233
| // Create-if-absent: a backend upload only counts toward the storage footprint | ||
| // when it stores a net-new object. If the object already exists, MinIO rejects | ||
| // the conditional PUT with a precondition failure and we classify it as | ||
| // already_exists instead of created. SetMatchETagExcept("*") emits an unquoted | ||
| // "If-None-Match: *" (the quoting bug is fixed in minio-go >= v7.0.72). |
Contributor
There was a problem hiding this comment.
nit: I'm not sure we need to check in a comment about the need to update minio-go here
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of BLA-4251 (Slice 2, bazel-remote half). Producer changes that let the storage-footprint accumulator count only net-new objects.
Flows (end-to-end)
High-level view of the storage-footprint pipeline across all three repos (repo that owns each flow in brackets). The same overview is on the bazel-remote, fa, and web PRs.
flowchart LR br["bazel-remote: conditional PUT (If-None-Match:*), classify + SizeOnDisk"] --> fa["FA: per-minute flush"] fa --> ch[("ClickHouse + ingested_at")] ch -->|"read (last_ingested_at, now-5m]"| acc["Laravel accumulator (every 15m)"] acc -->|"+= delta, advance cursor"| pg[("Postgres: storage_footprint_bytes (customer footprint)")] pg -->|"read reported footprint (no write)"| scan["MinIO scan (drift only)"] scan -->|"drift metrics + >10% alert"| prom["Grafana"] fa -->|"accounting counters"| promFlow 1 — Job accounting [bazel-remote + fa]
No change to the adoption endpoint. On job start, bazel-remote uploads AC/CAS entries to MinIO with
If-None-Match: *(create-if-absent) and waits for the result, classifying each upload ascreated(net-new),already_exists(412/304),error, ordropped(queue full). Net-new bytes are tallied asitem.SizeOnDisk. The host accumulates net-new bytes every minute and pushes them to ClickHouse. The upload queue is host-global and outlives any VM, so the periodic flush ships late drains after teardown on its own — no teardown-specific drain orqueuedmetric; any genuine loss (process death before drain, where the bytes also never reached MinIO) is caught by Flow 4 drift.Metrics:
bazelre_cache_upload_bytes_created,bazelre_cache_accounting_reports_completed,bazelre_cache_accounting_reports_failed_to_send.Flow 2 — ClickHouse [web]
remote_build_cache_operationsgains aningested_atcolumn, server-assigned (DEFAULT now()) on insert from the host — the watermark the accumulator advances over (created_atis host-assigned and minute-truncated, so unsuitable).Flow 3 — Laravel accounting [web]
bazelre_cache_entriesgainslast_ingested_at: the last ClickHouseingested_atconsumed intostorage_footprint_bytes. Every 15 minutes a sharded job runs over each active row, sums net-newcreatedbytes for the namespace fromlast_ingested_attonow() - 5 minutes, adds the delta tostorage_footprint_bytes, and advances the cursor. This accumulator is the customer-facing footprint shown in the dashboard.Flow 4 — Storage footprint & drift [web]
The existing MinIO scan-and-sum becomes a metrics-only drift calculator: it no longer writes
storage_footprint_bytes, and instead compares scanned (actual) vs accumulator (reported) and emits Grafana metrics only. v0 alerts on drift > 10%.Metrics:
bazelre_cache_actual_storage_bytes,bazelre_cache_reported_storage_bytes,bazelre_cache_storage_drift_bytes,bazelre_cache_storage_drift_percent.Summary
If-None-Match: *) viaPutObjectOptions.SetMatchETagExcept("*").created(no error),already_exists(412, or 304 from older MinIO), orerror(PUT failed).SizeOnDisk(stored/compressed) everywhere, including the queue-fulldroppedpath, so accounting shares units with the MinIO drift scan.minio-gov7.0.69 -> v7.0.72: v7.0.69 quoted the*wildcard and silently broke create-if-absent.Core.PutObjectis single-shot (no multipart), so every upload regardless of size carries the header.Verification gate (before un-drafting)
If-None-Match: *on an existing object; pin a known-good RELEASE in the ansible playbooks.Test plan
go test ./cache/s3proxy/-classifyUploadOutcome(created/already_exists 412+304/failed) andSizeOnDiskreporting on the dropped + observe paths.already_exists, footprint unchanged.Review / validation notes
error, and web countserror,rejected, anddroppedas failure statuses.storage_footprint_bytes, so overlapping runs should not double count the same ClickHouse window.storage_footprint_bytes; it emits histogram-based drift signals instead of region-only gauges to avoid last-writer-wins metrics.go test ./cache/s3proxy ./configgo test ./cache/bazelre ./util/chbatch ./util/metricvendor/bin/phpunit; CI should coverBuildCacheFootprintAccumulatorTestandBuildCacheStatusContractTest.Made with Cursor
Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.Need help on this PR? Tag
/codesmithwith what you need. Autofix is enabled. (Staging)