Skip to content

fix(assets-sync): split commit_batch to fit ingress message limit#68

Merged
lwshang merged 1 commit into
mainfrom
lwshang/fix_payload_too_large
Jun 1, 2026
Merged

fix(assets-sync): split commit_batch to fit ingress message limit#68
lwshang merged 1 commit into
mainfrom
lwshang/fix_payload_too_large

Conversation

@lwshang
Copy link
Copy Markdown
Collaborator

@lwshang lwshang commented May 29, 2026

Problem

Large deploys hit the IC's per-message ingress size limit on commit_batch. A real example from a downstream project: 1713 files with a multi-kilobyte Content-Security-Policy declared in _headers, producing 4843 batch operations. The boundary node rejects the call with:

status 413 Payload Too Large
Payload is too large: maximum body size is 4194304 bytes.

The 4 MiB number is the local boundary node's HTTP body cap; the real limit on production application subnets is 2 MiB per ingress message (MAX_INGRESS_BYTES_PER_MESSAGE_APP_SUBNET in dfinity/ic/rs/limits/src/lib.rs). Most of the payload bytes come from per-asset headers that _headers resolves to (a 1.5 KiB CSP × 1713 CreateAsset ops ≈ 2.9 MiB on its own).

Fix

Split commit_batch into multiple ingress calls when the operation set would exceed the limit.

Per-group caps:

  • 500 operations — bounds certified-tree work per call and limits the blast radius of a mid-deploy failure.
  • 1.5 MiB of inlined header bytes — leaves ~500 KiB of headroom under the 2 MiB cap for fixed per-op overhead (keys, chunk_ids, sha256s, variant tags, request envelope). Header bytes are the only variable-sized per-op field and are where real-world overruns come from.

Intermediate splits use a placeholder batch_id = 0; a trailing empty-ops call on the real batch_id releases the canister-side batch entry. The canister's commit_batch does not validate batch existence and consumes chunk_ids regardless of which batch created them, so chunks uploaded once under the real batch_id remain reachable across all splits — only create_batch GCs orphaned chunks.

Small deploys still use a single commit_batch with the real batch_id; the placeholder dance only kicks in when splitting is actually needed.

Same shape of fix that dfx's ic-asset/src/sync.rs uses for the same problem.

Trade-off

Splitting forfeits cross-batch atomicity. A failure mid-deploy leaves the canister with the operations from previously-successful calls applied; the next sync run diffs against the canister and resumes.

Test plan

  • cargo test -p assets-sync --lib — 177 tests pass, 10 new
  • cargo build -p plugin --target wasm32-wasip2 --release — clean build
  • Manual: re-run icp deploy against the downstream project that hit the original 413 and confirm it now goes through

🤖 Generated with Claude Code

@lwshang lwshang force-pushed the lwshang/drop_assets_toml branch from 719f27a to b092222 Compare June 1, 2026 19:54
Large deploys (e.g. ~1700 files with a multi-kilobyte
`Content-Security-Policy` declared in `_headers`) build a single
`commit_batch` payload that exceeds the IC's 2 MiB per-message ingress
limit on application subnets, surfacing as a 413 Payload Too Large
from the replica.

Split `commit_batch` into multiple ingress calls when the operation set
would exceed the limit. Per-group caps:
- 500 operations — bounds certified-tree work per call and limits the
  blast radius of a mid-deploy failure.
- 1.5 MiB of inlined header bytes — leaves ~500 KiB of headroom under
  the 2 MiB cap for fixed per-op overhead (keys, chunk_ids, sha256s,
  variant tags, request envelope). Header bytes are the only
  variable-sized per-op field and are where real-world overruns come
  from.

Intermediate splits use a placeholder `batch_id = 0`; a trailing
empty-ops call on the real batch_id releases the canister-side batch
entry. The canister's `commit_batch` does not validate batch existence
and consumes chunk_ids regardless of which batch created them, so the
chunks uploaded once under the real batch_id remain reachable across
all splits — only `create_batch` GCs orphaned chunks.

Small deploys still use a single `commit_batch` with the real batch_id;
the placeholder dance only kicks in when splitting is actually needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lwshang lwshang marked this pull request as ready for review June 1, 2026 20:00
@lwshang lwshang requested a review from a team as a code owner June 1, 2026 20:00
@lwshang lwshang changed the base branch from lwshang/drop_assets_toml to main June 1, 2026 20:00
@lwshang lwshang force-pushed the lwshang/fix_payload_too_large branch from ddc9077 to 0cc61e0 Compare June 1, 2026 20:00
@lwshang lwshang enabled auto-merge (squash) June 1, 2026 20:00
@lwshang lwshang merged commit 5e6d3eb into main Jun 1, 2026
11 checks passed
@lwshang lwshang deleted the lwshang/fix_payload_too_large branch June 1, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants