perf(assets-sync): batch chunks to collapse per-file create_chunks calls by lwshang · Pull Request #65 · dfinity/certified-assets

lwshang · 2026-05-28T23:11:05Z

Summary

Plugin-based assets sync was much slower than dfx deploy. Profiling against synthetic fixtures showed the cause: every chunk became its own create_chunks canister call, so a project of 100 small files made 100 update round-trips even though all 100 chunks fit in a single ~1.9 MB ingress payload. This PR adds in-process call-pattern benchmarking, then collapses the upload phase via greedy chunk packing — the worst-case bench fixture (1000 × 1 KB files) drops from 1000 create_chunks calls to 1.

Three more optimizations surfaced during the work (last_chunk inlining, commit_batch splitting, host-side concurrent calls); they're written up in assets-sync/tests/OPTIMIZATIONS.md with expected impact and the reason each is deferred. None block this PR.

Design

Layer-1 in-process bench, not e2e timing. The bottleneck we're optimizing is the call pattern — how many round-trips, of what size — not raw CPU. A BenchMock recording (method, arg_bytes) per call (assets-sync/tests/bench_sync.rs) gives a stable, deterministic table that's faster to diff across changes than a real-replica wall-clock run. Layer 2 (replica wall-clock against the e2e harness) is the eventual safety net before shipping, not the day-to-day iteration tool. Tests are #[ignore]'d so they don't slow the regular suite; run with cargo test --release --test bench_sync -- --ignored --nocapture --test-threads=1.
First-fit-decreasing packing. pack_and_upload_chunks in assets-sync/src/sync.rs collects every chunk from every not-yet-uploaded encoding across all assets into one pending list, sorts by size descending, then in each pass takes every chunk that still fits under MAX_CHUNK_SIZE. A full MAX-sized chunk fills its own call; many small chunks pack tightly together. Same algorithm the SDK's ic-asset ChunkUploader uses, ported to the sync model.
Route ids back via (asset_key, encoding, chunk_index). Each PendingChunk records where its eventual canister id should land — enc.chunk_ids[chunk_index] is pre-sized at collection time. This decouples upload order from the chunk order the canister expects in SetAssetContent, so the packer can sort freely without breaking serving order.
WASM-component constraint shaped the lever choice. The plugin compiles to wasm32-wasip2, which is single-threaded — no rayon, no async runtime. So the SDK's "fan out 50 concurrent uploads" lever isn't reachable from inside the component. Reducing the number of calls (this PR) and packing more into each is what's available without changing the plugin/host WIT interface. Adding a canister_call_batch host import for real concurrency is a separate, larger change documented under index.html should be read from the current path first and default to /index.html #4 in OPTIMIZATIONS.md.
Existing upload_chunks helper deleted, not extended. The previous code had a per-encoding upload_chunks that issued one call per chunk; the new pass operates across all encodings at once, so there's nothing for upload_chunks to do. Its six unit tests are replaced with seven focused tests of the new packer.
encoding_suffix kept. Still used by prepare_asset's "already in place" log line — unchanged.

Test plan

cargo test -p assets-sync — 172 lib tests green. The seven new pack_* unit tests cover: a full MAX-sized chunk gets its own call, an oversized encoding splits at MAX boundaries, many small chunks collapse into one call, FFD packs the largest chunk alone then fills the rest, ids route correctly to the right encoding slot across multi-chunk + single-chunk assets, zero-byte encodings still get one chunk_id, and already_in_place encodings are skipped (no calls made).

Bench delta on every fixture:

fixture	`create_chunks` BEFORE	AFTER
many_tiny (1000 × 1 KB)	1000	1
many_small (100 × 5 KB)	100	1
few_medium (10 × 2 MB)	20	12
one_huge (1 × 50 MB)	28	28 (chunks already at MAX — only `last_chunk` inlining helps here)

cargo fmt + cargo clippy — clean (commit 1a208a3).

🤖 Generated with Claude Code

In-process `cargo test`-driven bench that measures the canister-call pattern `sync()` emits against synthetic fixtures (many_tiny, many_small, few_medium, one_huge). Run with `--release --ignored --nocapture --test-threads=1` to print per-method call counts and arg bytes — used to establish a baseline for chunk-batching and last_chunk-inlining work without needing a real replica. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

upload_chunks previously issued one canister call per chunk, so a project of N small files made N round-trips. Replaced with a pack-and-upload pass that collects every chunk from every not-yet-uploaded encoding, then ships them in greedy first-fit-decreasing batches of up to MAX_CHUNK_SIZE total bytes. Measured via assets-sync/tests/bench_sync.rs: many_tiny 1000 × 1 KB: create_chunks 1000 → 1 many_small 100 × 5 KB: create_chunks 100 → 1 few_medium 10 × 2 MB: create_chunks 20 → 12 one_huge 1 × 50 MB: create_chunks 28 → 28 (already at MAX) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the optimizations surfaced by bench_sync.rs that we're not landing in this PR: last_chunk inlining, commit_batch splitting, get_asset_properties batching, and host-side concurrent calls via a new WIT batch import. Each entry notes expected impact (with bench numbers where applicable) and why it's deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lwshang mentioned this pull request May 28, 2026

infra: switch frontend to certified-assets snapshot wasm + sync plugin dfinity/developer-docs#280

Merged

6 tasks

raymondk approved these changes Jun 1, 2026

View reviewed changes

lwshang force-pushed the lwshang/assets_toml branch from 50b2714 to 2aee1e4 Compare June 1, 2026 19:43

lwshang and others added 4 commits June 1, 2026 15:49

chore: fmt & clippy

e5e7700

lwshang marked this pull request as ready for review June 1, 2026 19:49

lwshang requested a review from a team as a code owner June 1, 2026 19:49

lwshang changed the base branch from lwshang/assets_toml to main June 1, 2026 19:49

lwshang force-pushed the lwshang/perf branch from 1a208a3 to e5e7700 Compare June 1, 2026 19:49

lwshang enabled auto-merge (squash) June 1, 2026 19:49

lwshang merged commit 96d2778 into main Jun 1, 2026
11 checks passed

lwshang deleted the lwshang/perf branch June 1, 2026 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(assets-sync): batch chunks to collapse per-file create_chunks calls#65

perf(assets-sync): batch chunks to collapse per-file create_chunks calls#65
lwshang merged 4 commits into
mainfrom
lwshang/perf

lwshang commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lwshang commented May 28, 2026

Summary

Design

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants