Skip to content

Commit 6ef9252

Browse files
authored
docs: large Wasm guide (#99)
## Summary - Why Wasm modules grow large (dependencies, embedded data, debug symbols) - gzip compression: recipe-based (shrink + compress in icp.yaml) and manual ic-wasm + gzip - Wasm chunk store: upload_chunk/install_chunked_code, automatic icp-cli chunking, cycle costs - Wasm64: wasm64-unknown-unknown target (6 GiB), build.sh and icp.yaml config from backend_wasm64 example - WebAssembly SIMD: auto-vectorization via .cargo/config.toml, per-function via target_feature, intrinsics - Troubleshooting: four common failure modes ## Sync recommendation `informed by dfinity/portal — docs/building-apps/canister-management/compile.mdx, simd.mdx`
1 parent dd2d74c commit 6ef9252

1 file changed

Lines changed: 245 additions & 12 deletions

File tree

Lines changed: 245 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,254 @@
11
---
22
title: "Large Wasm Modules"
3-
description: "Deploy canisters that exceed the 2MB Wasm limit using chunk store and compression"
3+
description: "Deploy canisters that exceed the 2 MiB Wasm limit using chunk store and compression"
44
sidebar:
55
order: 9
66
---
77

8-
TODO: Write content for this page.
8+
ICP enforces a 2 MiB message size limit that applies to Wasm modules uploaded via `install_code`. Canisters with complex business logic, embedded ML models, or large dependency trees often exceed this threshold. There are two complementary approaches: reduce the module size with compression and dead-code stripping, or bypass the limit entirely by uploading the module in chunks.
99

10-
<!-- Content Brief -->
11-
Deploy canisters with Wasm modules larger than the 2MB limit. Cover the Wasm chunk store for splitting large modules, gzip compression for reducing size, the ic-wasm tool for stripping and optimizing, and Wasm64 support for 64-bit memory. Explain when and why you might need large modules (ML models, complex business logic). Include a section on WebAssembly SIMD — 200+ vector instructions for parallel computation that accelerate AI/ML inference, image processing, cryptographic operations, and other math-heavy workloads. SIMD is available on every ICP node.
10+
This guide covers both approaches, explains Wasm64 for canisters that need extended memory, and introduces WebAssembly SIMD for computationally intensive workloads.
1211

13-
<!-- Source Material -->
14-
- Portal: building-apps/developing-canisters/compile.mdx (large Wasm section)
15-
- Examples: backend_wasm64 (Rust)
16-
- icp-cli: --wasm-chunk-store flag
12+
## Why Wasm modules grow large
1713

18-
<!-- Cross-Links -->
19-
- guides/canister-management/optimization -- reducing Wasm size to avoid this entirely
20-
- reference/execution-errors -- Wasm size errors
21-
- guides/canister-management/lifecycle -- deployment with chunk store
14+
A compiled Wasm binary grows for several reasons:
15+
16+
- **Dense dependency trees** — Rust canisters that pull in many crates accumulate dead code that the compiler cannot always eliminate.
17+
- **Embedded data** — ML model weights, large lookup tables, or static assets compiled into the binary.
18+
- **Complex business logic** — feature-rich canisters with many update and query methods.
19+
- **Debug symbols** — by default, Rust release builds include name sections and other debug metadata.
20+
21+
Before reaching for the chunk store, consider whether [canister optimization](optimization.md) can reduce the binary enough to fit under 2 MiB.
22+
23+
## Approach 1: gzip compression
24+
25+
ICP's management canister understands gzip-compressed Wasm modules. When the `wasm_module` field of `install_code` starts with the gzip magic bytes `[0x1f, 0x8b, 0x08]`, the system decompresses it automatically before installation.
26+
27+
Gzip compression typically reduces Wasm binary size significantly, which is often enough to bring a large module under the 2 MiB threshold.
28+
29+
### Using a recipe
30+
31+
The Rust and prebuilt recipes expose a `compress` flag that gzip-compresses the output as the final build step:
32+
33+
```yaml
34+
canisters:
35+
- name: backend
36+
recipe:
37+
type: "@dfinity/rust@v3.2.0"
38+
configuration:
39+
package: backend
40+
shrink: true
41+
compress: true
42+
```
43+
44+
Setting `shrink: true` first removes unused functions and debug info while preserving function names for readable backtraces, then `compress: true` gzip-compresses the result. Using both together gives the largest size reduction.
45+
46+
### Using a custom build script
47+
48+
If you are not using a recipe, you can compress manually in your build steps:
49+
50+
```yaml
51+
canisters:
52+
- name: backend
53+
build:
54+
steps:
55+
- type: script
56+
commands:
57+
- cargo build --target wasm32-unknown-unknown --release
58+
- cp target/wasm32-unknown-unknown/release/backend.wasm "$ICP_WASM_OUTPUT_PATH"
59+
- ic-wasm "$ICP_WASM_OUTPUT_PATH" -o "$ICP_WASM_OUTPUT_PATH" shrink --keep-name-section
60+
- gzip --no-name "$ICP_WASM_OUTPUT_PATH"
61+
- mv "${ICP_WASM_OUTPUT_PATH}.gz" "$ICP_WASM_OUTPUT_PATH"
62+
```
63+
64+
The `--keep-name-section` flag preserves function names for readable backtraces while still removing dead code. Omit it if you do not need stack traces.
65+
66+
## Approach 2: the Wasm chunk store
67+
68+
When compression alone is not enough, the Wasm chunk store lets you upload modules larger than 2 MiB by splitting them into chunks, then assembling and installing them in one atomic operation.
69+
70+
### How the chunk store works
71+
72+
1. **Upload chunks** — Call `upload_chunk` on the management canister to store up to 1 MiB chunks in the target canister's chunk store. Each call returns the SHA-256 hash of the stored chunk.
73+
2. **Assemble and install** — Call `install_chunked_code` with the ordered list of chunk hashes. The system concatenates the chunks, verifies the aggregate hash matches `wasm_module_hash`, and installs the result as if you had called `install_code` directly.
74+
75+
The chunk store is bounded: each chunk is at most 1 MiB, and there is a maximum number of chunks per store (`CHUNK_STORE_SIZE`, defined in the IC interface spec — see the [management canister reference](../../reference/management-canister.md) for the exact value). You can inspect stored chunks with `stored_chunks` and clear the store with `clear_chunk_store`.
76+
77+
### icp-cli handles this automatically
78+
79+
When you run `icp deploy` or `icp canister install` with a Wasm module larger than 2 MiB, icp-cli automatically uses the chunk store — no configuration required. The tool splits the module, uploads each chunk, and calls `install_chunked_code` behind the scenes. <!-- TODO: verify automatic chunking behavior against icp-cli release notes -->
80+
81+
```bash
82+
icp deploy
83+
```
84+
85+
### Combining compression with the chunk store
86+
87+
You can combine gzip compression with the chunk store. A compressed module that is still larger than 2 MiB will still be split into chunks, but fewer chunks are needed — which means fewer upload calls and lower cycle costs. Enable both `shrink` and `compress` in your recipe, and let icp-cli decide whether chunking is needed.
88+
89+
### Cycle costs
90+
91+
Storing each chunk costs cycles proportional to 1 MiB of storage (even if the chunk is smaller). Chunks are temporary storage: they are consumed during `install_chunked_code` and do not accumulate after installation. If an installation attempt fails or is interrupted, call `clear_chunk_store` to reclaim the storage cycles before retrying.
92+
93+
## Wasm64: 64-bit memory addressing
94+
95+
Standard ICP canisters use the `wasm32-unknown-unknown` target, which limits addressable memory to 4 GiB. For canisters that need more — for example, those holding large in-memory datasets or running inference on large models — ICP supports the `wasm64-unknown-unknown` target with up to 6 GiB of addressable heap memory (an ICP platform limit).
96+
97+
Wasm64 is a separate concern from the chunk store. You might use one, the other, or both: the chunk store addresses the 2 MiB upload limit, while Wasm64 addresses the runtime memory limit.
98+
99+
### Building a Wasm64 canister
100+
101+
Wasm64 requires the Rust nightly toolchain and the `build-std` unstable feature, because the standard library must be compiled for the `wasm64-unknown-unknown` target rather than pulled from a precompiled artifact.
102+
103+
Create a `build.sh` script in your project directory:
104+
105+
```bash
106+
#!/bin/bash
107+
108+
# Ensure nightly toolchain and rust-src are available
109+
rustup toolchain install nightly
110+
rustup component add rust-src --toolchain nightly
111+
112+
# Build for wasm64
113+
cargo +nightly build \
114+
-Z build-std=std,panic_abort \
115+
--target wasm64-unknown-unknown \
116+
--release \
117+
-p backend
118+
119+
cp target/wasm64-unknown-unknown/release/backend.wasm target/backend.wasm
120+
candid-extractor target/backend.wasm > backend/backend.did
121+
```
122+
123+
Then reference the script in `icp.yaml`:
124+
125+
```yaml
126+
canisters:
127+
- name: backend
128+
build:
129+
steps:
130+
- type: script
131+
commands:
132+
- ./build.sh
133+
- cp target/backend.wasm "$ICP_WASM_OUTPUT_PATH"
134+
- ic-wasm "$ICP_WASM_OUTPUT_PATH" -o "${ICP_WASM_OUTPUT_PATH}" metadata "candid:service" -f 'backend/backend.did' -v public --keep-name-section
135+
```
136+
137+
The canister code itself does not require changes — the same Rust CDK code works on both `wasm32` and `wasm64`:
138+
139+
```rust
140+
#[ic_cdk::query]
141+
fn greet(name: String) -> String {
142+
format!("Hello, {}!", name)
143+
}
144+
145+
ic_cdk::export_candid!();
146+
```
147+
148+
See the [backend_wasm64 example](https://github.com/dfinity/examples/tree/master/rust/backend_wasm64) for a complete working project.
149+
150+
### Memory limits and Wasm64
151+
152+
Wasm64 canisters benefit from the `wasm_memory_limit` canister setting to cap WebAssembly heap usage, preventing runaway allocations:
153+
154+
```yaml
155+
canisters:
156+
- name: backend
157+
build:
158+
steps:
159+
- type: script
160+
commands:
161+
- ./build.sh
162+
- cp target/backend.wasm "$ICP_WASM_OUTPUT_PATH"
163+
settings:
164+
wasm_memory_limit: 4gib
165+
```
166+
167+
## WebAssembly SIMD
168+
169+
WebAssembly SIMD (Single Instruction, Multiple Data) is a set of more than 200 vector instructions defined in the WebAssembly core specification. SIMD allows a single instruction to operate on multiple data elements in parallel, which significantly accelerates compute-heavy workloads.
170+
171+
SIMD is available on every ICP node and does not require any special canister configuration beyond enabling the target feature in your build.
172+
173+
### When SIMD helps
174+
175+
SIMD provides the largest gains for workloads with regular, data-parallel structure:
176+
177+
- **AI/ML inference** — matrix multiplications, activation functions, convolutions
178+
- **Image processing** — pixel transforms, filtering, encoding/decoding
179+
- **Cryptographic operations** — hash computation, field arithmetic
180+
- **Scientific computing** — numerical simulations, signal processing
181+
182+
For "classical" canister operations — reward distribution, token accounting, query logic — the gains are smaller but still measurable.
183+
184+
### Loop auto-vectorization
185+
186+
The simplest way to benefit from SIMD is to enable the `simd128` target feature and let the Rust compiler auto-vectorize loops. This is a one-line change that often provides significant speedup without rewriting any code.
187+
188+
Enable SIMD globally for your entire workspace by creating `.cargo/config.toml`:
189+
190+
```toml
191+
[build]
192+
target = ["wasm32-unknown-unknown"]
193+
194+
[target.wasm32-unknown-unknown]
195+
rustflags = ["-C", "target-feature=+simd128"]
196+
```
197+
198+
Or enable it only for a specific function:
199+
200+
```rust
201+
#[target_feature(enable = "simd128")]
202+
#[ic_cdk::query]
203+
fn compute_heavy_operation() -> u64 {
204+
// The compiler auto-vectorizes eligible loops in this function
205+
// ...
206+
0
207+
}
208+
```
209+
210+
Auto-vectorization works best with tight numeric loops over contiguous arrays. The actual speedup depends on the algorithm, the compiler, and the input data.
211+
212+
### SIMD intrinsics
213+
214+
For maximum performance, you can use SIMD intrinsics directly. This gives full control over which vector instructions execute, at the cost of writing more complex code.
215+
216+
The `wasm32` platform exposes SIMD intrinsics through the `core::arch::wasm32` module (available when `simd128` is enabled). For a complete working example comparing naive, optimized, auto-vectorized, and SIMD intrinsic implementations of matrix multiplication, see the [WebAssembly SIMD example](https://github.com/dfinity/examples/tree/master/rust/simd) in the examples repository.
217+
218+
### Measuring SIMD performance
219+
220+
Use the `ic0.performance_counter` system API to count Wasm instructions before and after a computation:
221+
222+
```rust
223+
#[ic_cdk::query]
224+
fn benchmark_operation() -> u64 {
225+
let before = ic_cdk::api::instruction_counter();
226+
// ... your computation ...
227+
ic_cdk::api::instruction_counter() - before
228+
}
229+
```
230+
231+
Compare instruction counts with and without SIMD to measure the speedup. Lower instruction counts mean lower cycle costs and faster execution. The [`canbench`](https://github.com/dfinity/canbench) framework provides a more structured benchmarking workflow for tracking performance over time.
232+
233+
## Troubleshooting
234+
235+
**"Wasm module too large" error during install** — The module exceeds 2 MiB. Verify that icp-cli is up to date (automatic chunk store support was added in v0.2.x). If using a manual install flow, switch to the `install_chunked_code` management canister API.
236+
237+
**"Wasm chunk store error" during install** — The canister may lack sufficient cycles to store chunks (each 1 MiB chunk incurs a storage cost). Top up the canister's cycles balance before retrying. If chunks from a previous failed attempt are occupying the store, call `clear_chunk_store` first.
238+
239+
**Wasm64 build fails with missing target** — The `nightly` toolchain and `rust-src` component must both be installed. Run:
240+
241+
```bash
242+
rustup toolchain install nightly
243+
rustup component add rust-src --toolchain nightly
244+
```
245+
246+
**SIMD instructions have no measurable effect** — Some loops cannot be auto-vectorized. Check that the loop body is tight, operates on a contiguous slice, and does not contain branches or function calls that prevent vectorization. Profile with `ic_cdk::api::instruction_counter` to confirm the function is a bottleneck before investing in SIMD intrinsics.
247+
248+
## Next steps
249+
250+
- [Canister optimization](optimization.md) — reduce Wasm size before reaching for the chunk store
251+
- [Execution errors reference](../../reference/execution-errors.md) — Wasm size and chunk store error codes
252+
- [Canister lifecycle](lifecycle.md) — deployment modes and install options
253+
254+
<!-- Upstream: informed by dfinity/portal docs/building-apps/developing-canisters/compile.mdx; dfinity/portal docs/building-apps/network-features/simd.mdx; dfinity/examples rust/backend_wasm64; dfinity/portal docs/references/ic-interface-spec.md -->

0 commit comments

Comments
 (0)