perf(assets-sync): batch chunks to collapse per-file create_chunks calls#65
Merged
Conversation
Merged
6 tasks
raymondk
approved these changes
Jun 1, 2026
50b2714 to
2aee1e4
Compare
In-process `cargo test`-driven bench that measures the canister-call pattern `sync()` emits against synthetic fixtures (many_tiny, many_small, few_medium, one_huge). Run with `--release --ignored --nocapture --test-threads=1` to print per-method call counts and arg bytes — used to establish a baseline for chunk-batching and last_chunk-inlining work without needing a real replica. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
upload_chunks previously issued one canister call per chunk, so a project of N small files made N round-trips. Replaced with a pack-and-upload pass that collects every chunk from every not-yet-uploaded encoding, then ships them in greedy first-fit-decreasing batches of up to MAX_CHUNK_SIZE total bytes. Measured via assets-sync/tests/bench_sync.rs: many_tiny 1000 × 1 KB: create_chunks 1000 → 1 many_small 100 × 5 KB: create_chunks 100 → 1 few_medium 10 × 2 MB: create_chunks 20 → 12 one_huge 1 × 50 MB: create_chunks 28 → 28 (already at MAX) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the optimizations surfaced by bench_sync.rs that we're not landing in this PR: last_chunk inlining, commit_batch splitting, get_asset_properties batching, and host-side concurrent calls via a new WIT batch import. Each entry notes expected impact (with bench numbers where applicable) and why it's deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Plugin-based assets sync was much slower than
dfx deploy. Profiling against synthetic fixtures showed the cause: every chunk became its owncreate_chunkscanister call, so a project of 100 small files made 100 update round-trips even though all 100 chunks fit in a single ~1.9 MB ingress payload. This PR adds in-process call-pattern benchmarking, then collapses the upload phase via greedy chunk packing — the worst-case bench fixture (1000 × 1 KB files) drops from 1000create_chunkscalls to 1.Three more optimizations surfaced during the work (
last_chunkinlining,commit_batchsplitting, host-side concurrent calls); they're written up inassets-sync/tests/OPTIMIZATIONS.mdwith expected impact and the reason each is deferred. None block this PR.Design
Layer-1 in-process bench, not e2e timing. The bottleneck we're optimizing is the call pattern — how many round-trips, of what size — not raw CPU. A
BenchMockrecording(method, arg_bytes)per call (assets-sync/tests/bench_sync.rs) gives a stable, deterministic table that's faster to diff across changes than a real-replica wall-clock run. Layer 2 (replica wall-clock against the e2e harness) is the eventual safety net before shipping, not the day-to-day iteration tool. Tests are#[ignore]'d so they don't slow the regular suite; run withcargo test --release --test bench_sync -- --ignored --nocapture --test-threads=1.First-fit-decreasing packing.
pack_and_upload_chunksinassets-sync/src/sync.rscollects every chunk from every not-yet-uploaded encoding across all assets into one pending list, sorts by size descending, then in each pass takes every chunk that still fits underMAX_CHUNK_SIZE. A full MAX-sized chunk fills its own call; many small chunks pack tightly together. Same algorithm the SDK'sic-assetChunkUploaderuses, ported to the sync model.Route ids back via
(asset_key, encoding, chunk_index). EachPendingChunkrecords where its eventual canister id should land —enc.chunk_ids[chunk_index]is pre-sized at collection time. This decouples upload order from the chunk order the canister expects inSetAssetContent, so the packer can sort freely without breaking serving order.WASM-component constraint shaped the lever choice. The plugin compiles to
wasm32-wasip2, which is single-threaded — norayon, no async runtime. So the SDK's "fan out 50 concurrent uploads" lever isn't reachable from inside the component. Reducing the number of calls (this PR) and packing more into each is what's available without changing the plugin/host WIT interface. Adding acanister_call_batchhost import for real concurrency is a separate, larger change documented underindex.htmlshould be read from the current path first and default to/index.html#4 inOPTIMIZATIONS.md.Existing
upload_chunkshelper deleted, not extended. The previous code had a per-encodingupload_chunksthat issued one call per chunk; the new pass operates across all encodings at once, so there's nothing forupload_chunksto do. Its six unit tests are replaced with seven focused tests of the new packer.encoding_suffixkept. Still used byprepare_asset's "already in place" log line — unchanged.Test plan
cargo test -p assets-sync— 172 lib tests green. The seven newpack_*unit tests cover: a full MAX-sized chunk gets its own call, an oversized encoding splits at MAX boundaries, many small chunks collapse into one call, FFD packs the largest chunk alone then fills the rest, ids route correctly to the right encoding slot across multi-chunk + single-chunk assets, zero-byte encodings still get onechunk_id, andalready_in_placeencodings are skipped (no calls made).Bench delta on every fixture:
create_chunksBEFORElast_chunkinlining helps here)cargo fmt+cargo clippy— clean (commit1a208a3).🤖 Generated with Claude Code