Skip to content

perf(assets-sync): batch chunks to collapse per-file create_chunks calls#65

Merged
lwshang merged 4 commits into
mainfrom
lwshang/perf
Jun 1, 2026
Merged

perf(assets-sync): batch chunks to collapse per-file create_chunks calls#65
lwshang merged 4 commits into
mainfrom
lwshang/perf

Conversation

@lwshang
Copy link
Copy Markdown
Collaborator

@lwshang lwshang commented May 28, 2026

Summary

Plugin-based assets sync was much slower than dfx deploy. Profiling against synthetic fixtures showed the cause: every chunk became its own create_chunks canister call, so a project of 100 small files made 100 update round-trips even though all 100 chunks fit in a single ~1.9 MB ingress payload. This PR adds in-process call-pattern benchmarking, then collapses the upload phase via greedy chunk packing — the worst-case bench fixture (1000 × 1 KB files) drops from 1000 create_chunks calls to 1.

Three more optimizations surfaced during the work (last_chunk inlining, commit_batch splitting, host-side concurrent calls); they're written up in assets-sync/tests/OPTIMIZATIONS.md with expected impact and the reason each is deferred. None block this PR.

Design

  • Layer-1 in-process bench, not e2e timing. The bottleneck we're optimizing is the call pattern — how many round-trips, of what size — not raw CPU. A BenchMock recording (method, arg_bytes) per call (assets-sync/tests/bench_sync.rs) gives a stable, deterministic table that's faster to diff across changes than a real-replica wall-clock run. Layer 2 (replica wall-clock against the e2e harness) is the eventual safety net before shipping, not the day-to-day iteration tool. Tests are #[ignore]'d so they don't slow the regular suite; run with cargo test --release --test bench_sync -- --ignored --nocapture --test-threads=1.

  • First-fit-decreasing packing. pack_and_upload_chunks in assets-sync/src/sync.rs collects every chunk from every not-yet-uploaded encoding across all assets into one pending list, sorts by size descending, then in each pass takes every chunk that still fits under MAX_CHUNK_SIZE. A full MAX-sized chunk fills its own call; many small chunks pack tightly together. Same algorithm the SDK's ic-asset ChunkUploader uses, ported to the sync model.

  • Route ids back via (asset_key, encoding, chunk_index). Each PendingChunk records where its eventual canister id should land — enc.chunk_ids[chunk_index] is pre-sized at collection time. This decouples upload order from the chunk order the canister expects in SetAssetContent, so the packer can sort freely without breaking serving order.

  • WASM-component constraint shaped the lever choice. The plugin compiles to wasm32-wasip2, which is single-threaded — no rayon, no async runtime. So the SDK's "fan out 50 concurrent uploads" lever isn't reachable from inside the component. Reducing the number of calls (this PR) and packing more into each is what's available without changing the plugin/host WIT interface. Adding a canister_call_batch host import for real concurrency is a separate, larger change documented under index.html should be read from the current path first and default to /index.html #4 in OPTIMIZATIONS.md.

  • Existing upload_chunks helper deleted, not extended. The previous code had a per-encoding upload_chunks that issued one call per chunk; the new pass operates across all encodings at once, so there's nothing for upload_chunks to do. Its six unit tests are replaced with seven focused tests of the new packer.

  • encoding_suffix kept. Still used by prepare_asset's "already in place" log line — unchanged.

Test plan

  • cargo test -p assets-sync — 172 lib tests green. The seven new pack_* unit tests cover: a full MAX-sized chunk gets its own call, an oversized encoding splits at MAX boundaries, many small chunks collapse into one call, FFD packs the largest chunk alone then fills the rest, ids route correctly to the right encoding slot across multi-chunk + single-chunk assets, zero-byte encodings still get one chunk_id, and already_in_place encodings are skipped (no calls made).

  • Bench delta on every fixture:

    fixture create_chunks BEFORE AFTER
    many_tiny (1000 × 1 KB) 1000 1
    many_small (100 × 5 KB) 100 1
    few_medium (10 × 2 MB) 20 12
    one_huge (1 × 50 MB) 28 28 (chunks already at MAX — only last_chunk inlining helps here)
  • cargo fmt + cargo clippy — clean (commit 1a208a3).

🤖 Generated with Claude Code

@lwshang lwshang force-pushed the lwshang/assets_toml branch from 50b2714 to 2aee1e4 Compare June 1, 2026 19:43
lwshang and others added 4 commits June 1, 2026 15:49
In-process `cargo test`-driven bench that measures the canister-call
pattern `sync()` emits against synthetic fixtures (many_tiny, many_small,
few_medium, one_huge). Run with `--release --ignored --nocapture
--test-threads=1` to print per-method call counts and arg bytes — used
to establish a baseline for chunk-batching and last_chunk-inlining work
without needing a real replica.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
upload_chunks previously issued one canister call per chunk, so a
project of N small files made N round-trips. Replaced with a
pack-and-upload pass that collects every chunk from every
not-yet-uploaded encoding, then ships them in greedy
first-fit-decreasing batches of up to MAX_CHUNK_SIZE total bytes.

Measured via assets-sync/tests/bench_sync.rs:
  many_tiny  1000 × 1 KB:  create_chunks 1000 → 1
  many_small  100 × 5 KB:  create_chunks  100 → 1
  few_medium   10 × 2 MB:  create_chunks   20 → 12
  one_huge     1 × 50 MB:  create_chunks   28 → 28  (already at MAX)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the optimizations surfaced by bench_sync.rs that we're not
landing in this PR: last_chunk inlining, commit_batch splitting,
get_asset_properties batching, and host-side concurrent calls via a
new WIT batch import. Each entry notes expected impact (with bench
numbers where applicable) and why it's deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lwshang lwshang marked this pull request as ready for review June 1, 2026 19:49
@lwshang lwshang requested a review from a team as a code owner June 1, 2026 19:49
@lwshang lwshang changed the base branch from lwshang/assets_toml to main June 1, 2026 19:49
@lwshang lwshang enabled auto-merge (squash) June 1, 2026 19:49
@lwshang lwshang merged commit 96d2778 into main Jun 1, 2026
11 checks passed
@lwshang lwshang deleted the lwshang/perf branch June 1, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants