Skip to content

perf(storage): rewrite multipart upload — zero-copy, HTTP/2, no MD5#2381

Closed
ValentaTomas wants to merge 1 commit intomainfrom
perf/multipart-upload-rewrite
Closed

perf(storage): rewrite multipart upload — zero-copy, HTTP/2, no MD5#2381
ValentaTomas wants to merge 1 commit intomainfrom
perf/multipart-upload-rewrite

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 13, 2026

Summary

  • Rewrite GCS multipart upload as zero-copy slice-based package (gcpmultipart)
  • Remove MD5 per-part, remove buffer allocations — Upload(data []byte) slices directly into caller's mmap'd data
  • HTTP/2 transport with connection reuse (single TLS handshake for all parts)
  • Add StoreData(ctx, []byte) to skip os.ReadFile for mmap'd cache files
  • Memfile/rootfs uploads now go straight from block.Cache mmap → HTTP body

Supersedes #2378

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 13, 2026

PR Summary

Medium Risk
Changes the template diff upload path and GCS large-object upload implementation, which could affect correctness/performance of large memfile/rootfs uploads and retry behavior. Risk is mitigated by keeping one-shot uploads for small objects and adding updated multipart unit tests, but this still touches critical storage IO paths.

Overview
This PR switches memfile/rootfs template uploads from file-path based writes to directly uploading in-memory/mmap-backed diff data, by adding Data() to diff/chunker/cache abstractions and introducing StoreData(ctx, []byte) on storage.Seekable implementations (AWS, filesystem, cached seekable, peer fallback, and GCS). It also replaces the previous GCS multipart uploader with a new gcpmultipart uploader that slices the provided byte buffer into parts and uploads them concurrently over a shared retryable HTTP/2 client, removing the old file-based/MD5-heavy implementation and updating tests accordingly.

Reviewed by Cursor Bugbot for commit b3e8ad5. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16ead0cde5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +177 to 179
data := rootfsDiff.Data()
if data == nil {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid uploading directly from evictable mmap slices

TemplateBuild.Upload now pulls diff bytes via Diff.Data() and passes that slice to StoreData, but for local/storage diffs Data() returns the backing mmap (block.Cache.Data). Those mmaps can be closed by DiffStore eviction (scheduleDelete -> cache.Delete -> Diff.Close) while async upload is still running, and unlike the previous StoreFile(path) flow there is no open file descriptor to keep data valid after deletion. In low-disk-pressure scenarios with long uploads (especially >60s eviction delay), this can turn into invalid memory access or corrupted/failed uploads.

Useful? React with 👍 / 👎.

Comment on lines +395 to 399
data, err := os.ReadFile(path)
if err != nil {
return fmt.Errorf("failed to get file size: %w", err)
return fmt.Errorf("failed to read file: %w", err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep StoreFile streaming for large objects

StoreFile now unconditionally calls os.ReadFile before deciding upload strategy, which regresses memory behavior for large files: the whole object is loaded into RAM at once instead of being read chunk-by-chunk as before. Any remaining StoreFile callers (e.g. peer fallback path) can now hit large transient allocations or OOM under big snapshots, even though the previous implementation bounded memory by chunk/concurrency.

Useful? React with 👍 / 👎.

@ValentaTomas ValentaTomas marked this pull request as draft April 13, 2026 23:10
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Cache.Data() returns mmap reference without holding lock
    • Updated Data() to return a release function that holds the RLock until the caller is done with the mmap data, preventing SIGSEGV from concurrent Close() calls during multipart uploads.
  • ✅ Fixed: StorageDiff.Data() silently swallows chunker errors
    • Changed Data() to return error as third return value, ensuring chunker failures are properly propagated instead of silently skipping uploads.

Create PR

Or push these changes by commenting:

@cursor push 450d6b9081
Preview (450d6b9081)
diff --git a/packages/orchestrator/pkg/sandbox/block/cache.go b/packages/orchestrator/pkg/sandbox/block/cache.go
--- a/packages/orchestrator/pkg/sandbox/block/cache.go
+++ b/packages/orchestrator/pkg/sandbox/block/cache.go
@@ -410,15 +410,20 @@
 	return c.filePath
 }
 
-func (c *Cache) Data() []byte {
+func (c *Cache) Data() ([]byte, func()) {
 	c.mu.RLock()
-	defer c.mu.RUnlock()
 
 	if c.mmap == nil {
-		return nil
+		c.mu.RUnlock()
+
+		return nil, func() {}
 	}
 
-	return []byte(*c.mmap)
+	releaseCacheCloseLock := func() {
+		c.mu.RUnlock()
+	}
+
+	return []byte(*c.mmap), releaseCacheCloseLock
 }
 
 func NewCacheFromProcessMemory(

diff --git a/packages/orchestrator/pkg/sandbox/block/chunk.go b/packages/orchestrator/pkg/sandbox/block/chunk.go
--- a/packages/orchestrator/pkg/sandbox/block/chunk.go
+++ b/packages/orchestrator/pkg/sandbox/block/chunk.go
@@ -85,7 +85,7 @@
 	ReadAt(ctx context.Context, b []byte, off int64) (int, error)
 	WriteTo(ctx context.Context, w io.Writer) (int64, error)
 	Close() error
-	Data() []byte
+	Data() ([]byte, func())
 	FileSize() (int64, error)
 }
 
@@ -297,7 +297,7 @@
 	return c.cache.Close()
 }
 
-func (c *FullFetchChunker) Data() []byte {
+func (c *FullFetchChunker) Data() ([]byte, func()) {
 	return c.cache.Data()
 }
 

diff --git a/packages/orchestrator/pkg/sandbox/block/streaming_chunk.go b/packages/orchestrator/pkg/sandbox/block/streaming_chunk.go
--- a/packages/orchestrator/pkg/sandbox/block/streaming_chunk.go
+++ b/packages/orchestrator/pkg/sandbox/block/streaming_chunk.go
@@ -460,7 +460,7 @@
 	return c.cache.Close()
 }
 
-func (c *StreamingChunker) Data() []byte {
+func (c *StreamingChunker) Data() ([]byte, func()) {
 	return c.cache.Data()
 }
 

diff --git a/packages/orchestrator/pkg/sandbox/build/diff.go b/packages/orchestrator/pkg/sandbox/build/diff.go
--- a/packages/orchestrator/pkg/sandbox/build/diff.go
+++ b/packages/orchestrator/pkg/sandbox/build/diff.go
@@ -30,7 +30,7 @@
 	block.Slicer
 	CacheKey() DiffStoreKey
 	CachePath() (string, error)
-	Data() []byte
+	Data() ([]byte, func(), error)
 	FileSize() (int64, error)
 	Init(ctx context.Context) error
 }
@@ -43,8 +43,8 @@
 	return "", NoDiffError{}
 }
 
-func (n *NoDiff) Data() []byte {
-	return nil
+func (n *NoDiff) Data() ([]byte, func(), error) {
+	return nil, func() {}, nil
 }
 
 func (n *NoDiff) Slice(_ context.Context, _, _ int64) ([]byte, error) {

diff --git a/packages/orchestrator/pkg/sandbox/build/local_diff.go b/packages/orchestrator/pkg/sandbox/build/local_diff.go
--- a/packages/orchestrator/pkg/sandbox/build/local_diff.go
+++ b/packages/orchestrator/pkg/sandbox/build/local_diff.go
@@ -110,8 +110,10 @@
 	return b.cache.Path(), nil
 }
 
-func (b *localDiff) Data() []byte {
-	return b.cache.Data()
+func (b *localDiff) Data() ([]byte, func(), error) {
+	data, release := b.cache.Data()
+
+	return data, release, nil
 }
 
 func (b *localDiff) Close() error {

diff --git a/packages/orchestrator/pkg/sandbox/build/storage_diff.go b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
--- a/packages/orchestrator/pkg/sandbox/build/storage_diff.go
+++ b/packages/orchestrator/pkg/sandbox/build/storage_diff.go
@@ -150,13 +150,15 @@
 	return b.cachePath, nil
 }
 
-func (b *StorageDiff) Data() []byte {
+func (b *StorageDiff) Data() ([]byte, func(), error) {
 	c, err := b.chunker.Wait()
 	if err != nil {
-		return nil
+		return nil, func() {}, err
 	}
 
-	return c.Data()
+	data, release := c.Data()
+
+	return data, release, nil
 }
 
 func (b *StorageDiff) FileSize() (int64, error) {

diff --git a/packages/orchestrator/pkg/sandbox/template_build.go b/packages/orchestrator/pkg/sandbox/template_build.go
--- a/packages/orchestrator/pkg/sandbox/template_build.go
+++ b/packages/orchestrator/pkg/sandbox/template_build.go
@@ -174,7 +174,12 @@
 	})
 
 	eg.Go(func() error {
-		data := rootfsDiff.Data()
+		data, release, err := rootfsDiff.Data()
+		if err != nil {
+			return fmt.Errorf("failed to get rootfs data: %w", err)
+		}
+		defer release()
+
 		if data == nil {
 			return nil
 		}
@@ -183,7 +188,12 @@
 	})
 
 	eg.Go(func() error {
-		data := memfileDiff.Data()
+		data, release, err := memfileDiff.Data()
+		if err != nil {
+			return fmt.Errorf("failed to get memfile data: %w", err)
+		}
+		defer release()
+
 		if data == nil {
 			return nil
 		}

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go
}

return c.Data()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageDiff.Data() silently swallows chunker errors

Medium Severity

StorageDiff.Data() returns nil when b.chunker.Wait() fails, which causes the caller in template_build.go to silently skip the upload (if data == nil { return nil }). Every other method on StorageDiff (e.g., Close, ReadAt, Slice, FileSize) properly propagates the error from chunker.Wait(). Silently skipping an upload on error could lead to an incomplete template build stored without any error being reported.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 33cbbd4. Configure here.

return nil
}

return []byte(*c.mmap)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test body one

return nil
}

return []byte(*c.mmap)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use-after-free risk: Data() acquires a read lock, checks c.mmap != nil, then releases the lock before returning the slice.

return nil
}

return []byte(*c.mmap)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use-after-free risk: Data() acquires a read lock, checks c.mmap != nil, then releases the lock before returning the slice. The returned []byte is backed by mmap memory whose lifetime is controlled by Cache.Close() -> munmap. Any goroutine holding this slice (e.g. an upload goroutine calling storeData for a large memfile) can be left with a dangling pointer if Cache.Close() is called concurrently. The lock only protects the nil check, not the validity of the backing memory during use. In template_build.go this slice is passed into uploader.Upload which fans it out to concurrent HTTP goroutines with retries -- if the cache is closed while a retry is in flight the result is a SIGBUS or silent stale data.

objectName := o.path

fileInfo, err := os.Stat(path)
data, err := os.ReadFile(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StoreFile now unconditionally calls os.ReadFile(path) for all file sizes before passing to storeData. Previously, files larger than 50 MB were streamed from disk in parallel without a full in-memory copy. Any caller that still reaches StoreFile with a large file will now allocate the entire file on the heap instead of streaming. The large-file streaming path has been silently dropped from this method.

return nil
}

return c.Data()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageDiff.Data() can silently return incomplete data. When backed by a StreamingChunker, the mmap cache only contains bytes for ranges fetched via ReadAt; all other regions are zero-filled. The return value is a non-nil full-length slice, but callers in template_build.go only check if data == nil to skip the upload. If a StorageDiff-backed diff ever appears in the upload path (e.g. a snapshot whose base was only partially streamed), the upload proceeds with zero-filled holes and silently corrupts the stored object. Consider returning nil from Data() when completeness cannot be guaranteed.

Comment on lines 305 to +314
}

func (c *cachedSeekable) StoreData(ctx context.Context, data []byte) (e error) {
ctx, span := c.tracer.Start(ctx, "write object from data")
defer func() {
recordError(span, e)
span.End()
}()

return c.inner.StoreData(ctx, data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 cachedSeekable.StoreData() omits the NFS write-through cache population that StoreFile() performs when EnableWriteThroughCacheFlag is set, causing the first post-eviction reads for newly-uploaded templates to fall back to GCS instead of being served from the local NFS cache. The fix is to add a createCacheBlocksFromData helper (mirroring createCacheBlocksFromFile) and call it—along with writeLocalSize—inside StoreData() when the flag is enabled.

Extended reasoning...

The asymmetry

StoreFile() in cachedSeekable (storage_cache_seekable.go:281) conditionally calls createCacheBlocksFromFile() and writeLocalSize() when featureflags.EnableWriteThroughCacheFlag is set, eagerly populating the NFS block cache so subsequent reads are served locally. The new StoreData() method added in this PR (lines 305–314) only delegates to c.inner.StoreData() and skips both steps entirely.

Reachability

cachedSeekable is returned by cache.OpenSeekable() (storage_cache.go:108–126), which wraps the inner provider's Seekable in a cachedSeekable. When persistence is wrapped with WrapInNFSCache—as it is in the template-build path—template_build.uploadMemfile/uploadRootfs calls OpenSeekable and then StoreData on the resulting cachedSeekable. peerSeekable.StoreData() also delegates directly to the fallback (raw GCS), so the cachedSeekable layer is the only place the write-through cache would be populated.

Impact

For deployments with EnableWriteThroughCacheFlag=true, every template uploaded via StoreData() will have its NFS cache unpopulated. The first ReadAt() or OpenRangeReader() call for each 4 MB chunk after the in-memory template cache entry is evicted will miss NFS and fall back to a GCS round-trip, defeating the write-through optimisation for the new zero-copy path.

Addressing the refutations

One refuter argues this is self-healing: cachedSeekable.ReadAt() writes back to NFS on GCS fallback, so the miss is only one round-trip per chunk on first access after eviction. This is technically correct—the cache does heal lazily—but the write-through flag exists precisely to avoid that initial latency spike for freshly-uploaded templates. Replacing eager population with lazy healing is a behavioural regression for users who enabled the flag to guarantee warm-cache reads.

A second refuter argues cachedSeekable.StoreData() is dead code because the upload path sets t.persistence to the raw gcpStorage, not the NFS-wrapped provider. This is plausible if WrapInNFSCache is only applied on the read path. If that is true the bug cannot manifest today, but: (a) the public interface says cachedSeekable satisfies SeekableWriter including StoreData, (b) the omission creates a latent correctness trap for any future caller that does use the NFS-wrapped provider for writes, and (c) three of the four confirmatory verifiers independently concluded the method is reachable in the build path.

Step-by-step proof (flag-enabled scenario)

  1. persistence = WrapInNFSCache(gcpStorage) → OpenSeekable() returns a cachedSeekable.
  2. templateBuild.uploadMemfile() calls object.StoreData(ctx, data) on that cachedSeekable.
  3. cachedSeekable.StoreData() calls c.inner.StoreData() (GCS write) and returns—no NFS blocks written, no size file written.
  4. Later, the in-memory template entry is evicted.
  5. A sandbox reads memfile chunk at offset 0: cachedSeekable.ReadAt() opens the NFS chunk file → ENOENT (cache miss) → falls back to GCS ReadAt → writes chunk to NFS asynchronously.
  6. If StoreData had called createCacheBlocksFromData (equivalent of createCacheBlocksFromFile operating on []byte), step 5 would have been an NFS hit with no GCS latency.

Fix

Add createCacheBlocksFromData(ctx, data []byte) that iterates over the in-memory slice in chunkSize windows and writes each chunk to NFS (the data is already in memory, so this is cheap). Then in StoreData(), mirror the StoreFile() conditional:

if c.flags.BoolFlag(ctx, featureflags.EnableWriteThroughCacheFlag) {
    c.goCtx(ctx, func(ctx context.Context) {
        size, err := c.createCacheBlocksFromData(ctx, data)
        // ... handle err, call writeLocalSize
    })
}
return c.inner.StoreData(ctx, data)

Comment on lines 411 to +421
}

func (c *Cache) Data() []byte {
c.mu.RLock()
defer c.mu.RUnlock()

if c.mmap == nil {
return nil
}

return []byte(*c.mmap)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Cache.Data() returns a mmap-backed []byte after releasing its RLock, so Cache.Close() can call mmap.Unmap() while upload goroutines are still reading the slice, causing a SIGSEGV. Fix by following the existing addressBytes() pattern: return both the slice and a releaseCacheCloseLock closure that keeps the RLock held until the upload is complete.

Extended reasoning...

The bug in detail

Cache.Data() (cache.go:411-421) acquires c.mu.RLock(), constructs []byte(*c.mmap) — a slice pointing directly into the memory-mapped region — and then immediately releases the lock via defer. Once Data() returns, the caller holds a raw pointer into mmap memory with no reference-counting, no lock, and no lifetime guarantee. Concurrently, Cache.Close() acquires c.mu.Lock() (write lock) and calls c.mmap.Unmap(), which instructs the kernel to tear down the mapping. Any subsequent read of the returned slice accesses unmapped memory and produces a SIGSEGV.

The concrete crash path

In template_build.go the Upload() function spawns two errgroup goroutines: one calls memfileDiff.Data() to get the mmap slice, then passes it to uploadMemfile → object.StoreData → gcpmultipart.Upload, where it is sliced as data[start:end] per part and written to HTTP PUT requests. For GB-scale files this upload can take minutes. The RLock is released the moment Data() returns — before uploadMemfile is even called. If any concurrent code path calls diff.Close() during this window, the mmap is unmapped while the upload is still streaming parts.

Why Close() can be called concurrently

Multiple triggerable paths exist: (1) TTL-based eviction in templateCache — the OnEviction callback calls template.Close() → diff.Close() → cache.Close(); (2) disk-space-based eviction in DiffStore, which has a scheduleDelete path with a 60-second delay added explicitly to prevent race conditions with exposed slices, but 60 seconds is insufficient for multi-GB uploads; (3) context cancellation during sandbox cleanup or build failure triggering deferred Close() calls. Any of these racing with the upload goroutine produces the use-after-free.

The smoking gun: addressBytes() already solves this

The same file (cache.go) contains addressBytes(), which explicitly keeps the RLock held and returns a releaseCacheCloseLock func() to the caller (used correctly in FullFetchChunker.fetchToCache and StreamingChunker.runFetch). Data() was introduced as part of this PR without following that pattern, breaking the safety invariant the rest of the codebase upholds.

Step-by-step proof

  1. The errgroup in template_build.go calls memfileDiff.Data() — RLock acquired, []byte(*c.mmap) returned, RLock released (lock count = 0).
  2. Upload goroutine enters gcpmultipart.Upload and issues, say, 20 concurrent part PUTs, each reading data[i*ChunkSize:(i+1)*ChunkSize].
  3. After 60 seconds, disk-space eviction fires: DiffStore.deleteOldestFromCache → scheduleDelete timer fires → cache.Delete → OnEviction → diff.Close() → cache.Close().
  4. cache.Close() calls c.mu.Lock() — succeeds immediately because lock count is 0 — then c.mmap.Unmap().
  5. Part PUT goroutines are still executing bytes.NewReader(data[start:end]) and writing to the HTTP body; the underlying memory is now unmapped → SIGSEGV.

How to fix

Data() should follow the addressBytes() pattern: acquire RLock, return both the slice and a release closure, and require the caller to hold that closure for the lifetime of the upload. The template_build.go goroutines would call release() in a defer after uploadMemfile/uploadRootfs returns. Alternatively, Data() could copy the mmap into a heap buffer (safe but defeats the zero-copy goal of this PR).

@ValentaTomas ValentaTomas force-pushed the perf/multipart-upload-rewrite branch 2 times, most recently from d787a42 to cbf0853 Compare April 13, 2026 23:34
Comment thread packages/shared/pkg/storage/gcpmultipart/uploader_test.go Outdated
Comment thread packages/shared/pkg/storage/storage_google.go
}

return []byte(*c.mmap)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache.Data() missing isClosed check before returning mmap

Low Severity

Cache.Data() does not check c.isClosed() before returning the mmap contents. After Close() calls c.mmap.Unmap(), the mmap pointer is not set to nil, so Data() passes the nil check and returns a byte slice referencing unmapped memory. Every other method on Cache (Slice, addressBytes, Size) checks isClosed() as a safety guard. If Data() is ever called after Close(), it would return a dangling reference.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 62a5e03. Configure here.

@ValentaTomas ValentaTomas force-pushed the perf/multipart-upload-rewrite branch 4 times, most recently from 466867d to 04ea467 Compare April 14, 2026 00:04
Extract multipart upload into packages/shared/pkg/storage/multipart/.
The new implementation takes a []byte directly (typically mmap'd) and
slices into it per part — zero allocation, zero copy in the upload loop.

Key changes:
- Remove MD5 per-part computation (was 20% CPU in prod profile)
- Remove mutex (parts array uses distinct indices per goroutine)
- Remove sync.Pool and buffer copies
- Use mmap for large files in StoreFile instead of os.ReadFile
- Simplify XML types to private, minimal structs
- Deduplicate HTTP boilerplate via doRequest helper
- Trim Uploader struct to 3 fields (token, client, baseURL)
@ValentaTomas ValentaTomas force-pushed the perf/multipart-upload-rewrite branch from 04ea467 to b3e8ad5 Compare April 14, 2026 00:13
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b3e8ad5. Configure here.

}

return t.uploadMemfile(ctx, *memfilePath)
return t.uploadMemfile(ctx, data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from Data() silently skip uploads causing data loss

High Severity

StorageDiff.Data() swallows the error from chunker.Wait() and returns nil, which Upload() treats identically to NoDiff (no data to upload). If the chunker fails for any reason, the memfile or rootfs upload is silently skipped and the overall Upload returns success. The old code propagated errors — CachePath() errors caused the snapshot upload to fail, and even without that, StoreFile would fail explicitly on a missing/incomplete file. The new path hides failures, risking silent data loss during template builds.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b3e8ad5. Configure here.

}()

return c.inner.StoreData(ctx, data)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StoreData skips write-through cache unlike StoreFile

Medium Severity

cachedSeekable.StoreData directly delegates to c.inner.StoreData without populating the local write-through cache, unlike StoreFile which conditionally creates cache blocks and writes local size when EnableWriteThroughCacheFlag is enabled. When this flag is on, uploads via StoreData won't populate the local disk cache, causing subsequent reads of the same object to miss the local cache and fall through to remote storage.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b3e8ad5. Configure here.

@ValentaTomas ValentaTomas marked this pull request as ready for review April 14, 2026 06:10
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3e8ad523e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +195 to +197
func (o *awsObject) StoreData(ctx context.Context, data []byte) error {
return o.Put(ctx, data)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use multipart upload in AWS StoreData

TemplateBuild.Upload now sends memfile/rootfs through StoreData, so this new method is on the hot path for snapshot uploads. In awsObject, StoreData calls Put (single PutObject) instead of the multipart uploader used by StoreFile; on AWS this regresses large-object handling and causes uploads to fail for sufficiently large snapshots (notably >5 GiB) that previously worked through multipart upload.

Useful? React with 👍 / 👎.

Comment on lines 386 to +400
}

func (o *gcpObject) StoreFile(ctx context.Context, path string) (e error) {
ctx, span := tracer.Start(ctx, "write to gcp from file system")
ctx, span := tracer.Start(ctx, "store file to gcp")
defer func() {
recordError(span, e)
span.End()
}()

bucketName := o.storage.bucket.BucketName()
objectName := o.path

fileInfo, err := os.Stat(path)
data, err := os.ReadFile(path)
if err != nil {
return fmt.Errorf("failed to get file size: %w", err)
return fmt.Errorf("failed to read file: %w", err)
}

// If the file is too small, the overhead of writing in parallel isn't worth the effort.
// Write it in one shot instead.
if fileInfo.Size() < gcpMultipartUploadChunkSize {
return o.storeData(ctx, data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 gcpObject.StoreFile now calls os.ReadFile(path) unconditionally, loading the entire file into heap memory before deciding whether to use multipart upload; previously, files ≥50 MB were streamed chunk-by-chunk via UploadFileInParallel using ReadAt, bounding peak memory to ~50 MB. While the primary upload path (uploadMemfile/uploadRootfs) was migrated to StoreData() in this PR, the StoreFile method remains on the SeekableWriter interface and is still delegated through cachedSeekable.StoreFile → c.inner.StoreFile and peerSeekable.StoreFile → fallback.StoreFile; any caller that reaches those paths with a large file (e.g. a 500 MB rootfs) will now allocate the full file on the heap instead of streaming, risking OOM under memory pressure.

Extended reasoning...

What the bug is and how it manifests

The new gcpObject.storeData() helper is called from both StoreData() (the new path) and StoreFile() (the old path). StoreFile() now reads the entire file with os.ReadFile(path) before delegating to storeData(), whereas the old implementation only did a full in-memory read for files smaller than 50 MB (gcpMultipartUploadChunkSize). For files ≥ 50 MB the old code called UploadFileInParallel, which opened the file, read one 50 MB chunk at a time with file.ReadAt, and released each buffer after the part was uploaded — peak heap usage was bounded to approximately one chunk (~50 MB).

The specific code path

packages/shared/pkg/storage/storage_google.go:386-400 now reads:
data, err := os.ReadFile(path) // entire file into heap
return o.storeData(ctx, data)
The entire file is resident in memory from before the first HTTP PUT until the upload completes. For a 500 MB memfile or rootfs this is a 500 MB heap allocation per concurrent build.

Why existing code doesn't prevent it

The PR correctly migrated the hot path: template_build.go uploadMemfile/uploadRootfs now calls object.StoreData(ctx, data) with an already-mmap'd slice, bypassing StoreFile entirely. However, StoreFile is a required method on the SeekableWriter interface (storage.go:102-104), and two delegating implementations still forward to it:
• cachedSeekable.StoreFile (storage_cache_seekable.go:281) → c.inner.StoreFile(ctx, path)
• peerSeekable.StoreFile (peerclient/seekable.go:130) → fallback.StoreFile(ctx, path)
Any call to StoreFile on either wrapper (e.g. via NFS-cache write-through on a template stored by file path, or a peer fallback write) will trigger the full allocation.

Addressing the refutation

One verifier argued that a codebase-wide grep shows zero business-logic callers of .StoreFile() after this PR, so the scenario is unreachable today. That observation is accurate for the current snapshot of business-logic code. However: (a) StoreFile is a required interface method — future callers or integrations may rely on it; (b) the peerSeekable and cachedSeekable delegating wrappers preserve the reachability of the regression should any caller chain through them; (c) the regression silently removes a safety property that was explicitly designed into the old implementation (streaming large files to avoid OOM). Leaving the method in a broken state for large files is a footgun.

Impact

On nodes handling large sandbox images through the peerSeekable or cachedSeekable code paths, or if StoreFile is called by any downstream code, a 500 MB rootfs upload would allocate ~500 MB on the heap instead of ~50 MB. Under concurrent builds this can cause OOM kills of the orchestrator process.

Step-by-step proof

  1. A caller obtains a cachedSeekable object (e.g. storage_cache.go OpenSeekable wrapping gcpStorage).
  2. The caller invokes object.StoreFile(ctx, '/path/to/500mb-rootfs').
  3. cachedSeekable.StoreFile calls c.inner.StoreFile(ctx, path) (storage_cache_seekable.go:304).
  4. c.inner is a gcpObject; its StoreFile calls data, _ = os.ReadFile('/path/to/500mb-rootfs') — 500 MB heap allocation.
  5. storeData() is called with the full slice; multipart upload proceeds with 500 MB locked in heap until all parts complete.
  6. Compare: old code at this step opened the file, called UploadFileInParallel which iterated chunk-by-chunk with ReadAt — peak heap was one 50 MB chunk.

How to fix

Restore the streaming behaviour inside StoreFile: open the file, stat its size, and if size >= gcpmultipart.ChunkSize call the uploader with a file-based reader (or mmap the file and pass the slice, then munmap after upload). Alternatively, update storeData to accept an io.ReaderAt + size so that both StoreFile and StoreData can use the same parallel-upload code without requiring the full slice in RAM for file-backed uploads.

🔬 also observed by chatgpt-codex-connector

Comment on lines 150 to +160
return b.cachePath, nil
}

func (b *StorageDiff) Data() []byte {
c, err := b.chunker.Wait()
if err != nil {
return nil
}

return c.Data()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 StorageDiff.Data() returns nil when b.chunker.Wait() fails instead of propagating the error, causing the caller in template_build.go to silently skip the upload (via 'if data == nil { return nil }') as though there were no diff. Every other method on StorageDiff (Close, ReadAt, Slice, WriteTo, FileSize) properly returns the error; Data() is the only method that swallows it. Fix by changing the Diff.Data() interface to return ([]byte, error) so errors can propagate.

Extended reasoning...

What the bug is

StorageDiff.Data() at storage_diff.go:152-160 calls b.chunker.Wait() and, if it returns an error (e.g. network failure during chunker initialization, context cancellation, or any error set via SetError), the method returns nil with no error signal:

func (b *StorageDiff) Data() []byte {
c, err := b.chunker.Wait()
if err != nil {
return nil // error silently dropped
}
return c.Data()
}

The specific code path that triggers it

In template_build.go (Upload function), both the memfile and rootfs upload goroutines call Diff.Data() and then check the result:

data := memfileDiff.Data()
if data == nil {
    return nil  // treated as NoDiff, upload silently skipped
}
return t.uploadMemfile(ctx, data)

Since NoDiff.Data() also returns nil (by design), the caller cannot distinguish between no diff to upload and a chunker failure. Both result in silently skipping the upload with nil returned from the goroutine.

Why existing code does not prevent it

Every other method on StorageDiff propagates errors from chunker.Wait(): Close, ReadAt, Slice, WriteTo, and FileSize all return the error to callers. The Data() method introduced by this PR is the sole exception. The root cause is structural: the Diff.Data() interface is declared as returning only []byte with no error return, making proper error propagation impossible without a signature change.

What the impact would be

If b.chunker.Wait() returns an error (e.g., upstream storage cannot be fetched during Init, or context is cancelled between Init and Upload), the memfile or rootfs upload is silently skipped. TemplateBuild.Upload() returns nil (success), Snapshot.Upload() returns nil, and the caller has no indication that the template build is incomplete. This creates a stored template entry missing its memfile or rootfs data: silent data loss during template builds.

How to fix it

Change the Diff interface Data() signature to return ([]byte, error). StorageDiff.Data() can then return (nil, err) on chunker failure and callers in template_build.go can properly check and propagate the error. NoDiff.Data() would return (nil, nil). Alternatively, store the initialization error in StorageDiff and expose it through a separate method.

Step-by-step proof

  1. StorageDiff is initialized with chunker = utils.NewSetOnceblock.Chunker. During Init(), if OpenSeekable or Size fails, chunker.SetError(errMsg) is called.
  2. Later, TemplateBuild.Upload() runs. The memfile goroutine calls memfileDiff.Data().
  3. Inside Data(), b.chunker.Wait() returns the stored error immediately.
  4. Data() returns nil to the caller with no error.
  5. The caller checks: if data == nil { return nil }. Condition is true, returns nil just like NoDiff.
  6. The errgroup collects no errors. Upload() returns nil to the caller.
  7. The template is recorded as successfully built but the memfile was never written to GCS.

Comment on lines +68 to +90
client *retryablehttp.Client
}

func NewUploader(ctx context.Context, bucketName, objectName string) (*Uploader, error) {
creds, err := google.FindDefaultCredentials(ctx, "https://www.googleapis.com/auth/cloud-platform")
if err != nil {
return nil, fmt.Errorf("get credentials: %w", err)
}

token, err := creds.TokenSource.Token()
if err != nil {
return nil, fmt.Errorf("get token: %w", err)
}

return &Uploader{
token: token.AccessToken,
baseURL: "https://" + bucketName + ".storage.googleapis.com/" + objectName,
client: httpClient,
}, nil
}

func (u *Uploader) Upload(ctx context.Context, data []byte, maxConcurrency int) (int64, error) {
uploadID, err := u.initiate(ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 The new gcpmultipart.NewUploader fetches an OAuth2 access token once via creds.TokenSource.Token() and stores it as a plain string, which will expire after ~1 hour; if a multi-part upload runs longer than that, all subsequent PUT requests will receive HTTP 401 with no retry path since DefaultRetryPolicy does not retry 401. This is a pre-existing issue: the deleted gcp_multipart.go used the exact same pattern (token.AccessToken stored as a string), and this PR carries it forward unchanged. The fix is to store creds.TokenSource and call .Token() per request so the client library handles auto-refresh.

Extended reasoning...

What the bug is

NewUploader (uploader.go:77-80) calls creds.TokenSource.Token() once and stores the raw access token string in Uploader.token. Every subsequent doRequest() call sets "Authorization: Bearer "+u.token using this static string. Google OAuth2 access tokens have a fixed ~1-hour lifetime. Once the token expires, all part PUTs return HTTP 401 Unauthorized.

The specific code path

The retryablehttp client uses retryablehttp.DefaultRetryPolicy as its CheckRetry function. DefaultRetryPolicy only retries on connection errors, 429 (Too Many Requests), and 500-range server errors. HTTP 401 is not in that list. So once the token expires mid-upload, every remaining part PUT — including retries — will receive 401 and fail permanently. The upload returns an error with no recovery.

Why existing code does not prevent it

The token is captured as a plain string at construction time. There is no per-request token refresh. doRequest() has no mechanism to detect a 401 caused by token expiry versus a permissions error, and even if it did, DefaultRetryPolicy prevents the retry from happening. The OAuth2 client library TokenSource would handle caching and auto-refresh automatically if called per request, but only Token() is called once at construction.

Pre-existing nature

The deleted gcp_multipart.go had the identical pattern: NewMultipartUploaderWithRetryConfig called creds.TokenSource.Token() at construction and stored token.AccessToken as a plain string in MultipartUploader.token, used identically in all requests. The new gcpmultipart package is a rewrite but replicates the same token-lifetime semantics. Critically, NewUploader is called fresh per upload from storeData() (storage_google.go), so each upload begins with a fresh token. The window for expiry is only during a single upload lifetime.

Practical impact

At typical GCP network speeds (>100 MB/s) with 16 concurrent 50 MB parts, even a 10 GB file completes in minutes, well within the 1-hour window. The risk materializes only for extremely large files (hundreds of GB) or severely degraded network/retry conditions where total elapsed time exceeds ~1 hour.

Step-by-step proof

  1. NewUploader is called; creds.TokenSource.Token() returns a token with Expiry = now + 59 minutes.
  2. token.AccessToken is stored as u.token; the TokenSource is discarded.
  3. Upload begins; 16 goroutines issue concurrent PUT requests.
  4. Due to large file size and intermittent 500 errors requiring retries, elapsed time exceeds 60 minutes.
  5. All subsequent PUT requests include "Authorization: Bearer ".
  6. GCS returns HTTP 401.
  7. DefaultRetryPolicy sees 401, returns (false, nil) — do not retry.
  8. Each part goroutine returns an error; errgroup cancels remaining parts.
  9. Upload fails permanently with no token-refresh recovery.

Fix

Store creds.TokenSource (not token.AccessToken) in the Uploader struct. In doRequest(), call tokenSource.Token() to get a fresh (or cached, if still valid) token before setting the Authorization header. The oauth2 package TokenSource implementations cache tokens and only fetch a new one when the current one is within the expiry window.

@jakubno jakubno marked this pull request as draft April 14, 2026 12:30
Copy link
Copy Markdown
Member

@jakubno jakubno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate this into multiple PRs? 🙏🏻

@ValentaTomas
Copy link
Copy Markdown
Member Author

Superseded by smaller split PRs: removing MD5/mutex first, then zero-copy upload path as follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants