|
1 | 1 | --- |
2 | 2 | title: "Large Wasm Modules" |
3 | | -description: "Deploy canisters that exceed the 2MB Wasm limit using chunk store and compression" |
| 3 | +description: "Deploy canisters that exceed the 2 MiB Wasm limit using chunk store and compression" |
4 | 4 | sidebar: |
5 | 5 | order: 9 |
6 | 6 | --- |
7 | 7 |
|
8 | | -TODO: Write content for this page. |
| 8 | +ICP enforces a 2 MiB message size limit that applies to Wasm modules uploaded via `install_code`. Canisters with complex business logic, embedded ML models, or large dependency trees often exceed this threshold. There are two complementary approaches: reduce the module size with compression and dead-code stripping, or bypass the limit entirely by uploading the module in chunks. |
9 | 9 |
|
10 | | -<!-- Content Brief --> |
11 | | -Deploy canisters with Wasm modules larger than the 2MB limit. Cover the Wasm chunk store for splitting large modules, gzip compression for reducing size, the ic-wasm tool for stripping and optimizing, and Wasm64 support for 64-bit memory. Explain when and why you might need large modules (ML models, complex business logic). Include a section on WebAssembly SIMD — 200+ vector instructions for parallel computation that accelerate AI/ML inference, image processing, cryptographic operations, and other math-heavy workloads. SIMD is available on every ICP node. |
| 10 | +This guide covers both approaches, explains Wasm64 for canisters that need extended memory, and introduces WebAssembly SIMD for computationally intensive workloads. |
12 | 11 |
|
13 | | -<!-- Source Material --> |
14 | | -- Portal: building-apps/developing-canisters/compile.mdx (large Wasm section) |
15 | | -- Examples: backend_wasm64 (Rust) |
16 | | -- icp-cli: --wasm-chunk-store flag |
| 12 | +## Why Wasm modules grow large |
17 | 13 |
|
18 | | -<!-- Cross-Links --> |
19 | | -- guides/canister-management/optimization -- reducing Wasm size to avoid this entirely |
20 | | -- reference/execution-errors -- Wasm size errors |
21 | | -- guides/canister-management/lifecycle -- deployment with chunk store |
| 14 | +A compiled Wasm binary grows for several reasons: |
| 15 | + |
| 16 | +- **Dense dependency trees** — Rust canisters that pull in many crates accumulate dead code that the compiler cannot always eliminate. |
| 17 | +- **Embedded data** — ML model weights, large lookup tables, or static assets compiled into the binary. |
| 18 | +- **Complex business logic** — feature-rich canisters with many update and query methods. |
| 19 | +- **Debug symbols** — by default, Rust release builds include name sections and other debug metadata. |
| 20 | + |
| 21 | +Before reaching for the chunk store, consider whether [canister optimization](optimization.md) can reduce the binary enough to fit under 2 MiB. |
| 22 | + |
| 23 | +## Approach 1: gzip compression |
| 24 | + |
| 25 | +ICP's management canister understands gzip-compressed Wasm modules. When the `wasm_module` field of `install_code` starts with the gzip magic bytes `[0x1f, 0x8b, 0x08]`, the system decompresses it automatically before installation. |
| 26 | + |
| 27 | +Gzip compression typically reduces Wasm binary size significantly, which is often enough to bring a large module under the 2 MiB threshold. |
| 28 | + |
| 29 | +### Using a recipe |
| 30 | + |
| 31 | +The Rust and prebuilt recipes expose a `compress` flag that gzip-compresses the output as the final build step: |
| 32 | + |
| 33 | +```yaml |
| 34 | +canisters: |
| 35 | + - name: backend |
| 36 | + recipe: |
| 37 | + type: "@dfinity/rust@v3.2.0" |
| 38 | + configuration: |
| 39 | + package: backend |
| 40 | + shrink: true |
| 41 | + compress: true |
| 42 | +``` |
| 43 | +
|
| 44 | +Setting `shrink: true` first removes unused functions and debug info while preserving function names for readable backtraces, then `compress: true` gzip-compresses the result. Using both together gives the largest size reduction. |
| 45 | + |
| 46 | +### Using a custom build script |
| 47 | + |
| 48 | +If you are not using a recipe, you can compress manually in your build steps: |
| 49 | + |
| 50 | +```yaml |
| 51 | +canisters: |
| 52 | + - name: backend |
| 53 | + build: |
| 54 | + steps: |
| 55 | + - type: script |
| 56 | + commands: |
| 57 | + - cargo build --target wasm32-unknown-unknown --release |
| 58 | + - cp target/wasm32-unknown-unknown/release/backend.wasm "$ICP_WASM_OUTPUT_PATH" |
| 59 | + - ic-wasm "$ICP_WASM_OUTPUT_PATH" -o "$ICP_WASM_OUTPUT_PATH" shrink --keep-name-section |
| 60 | + - gzip --no-name "$ICP_WASM_OUTPUT_PATH" |
| 61 | + - mv "${ICP_WASM_OUTPUT_PATH}.gz" "$ICP_WASM_OUTPUT_PATH" |
| 62 | +``` |
| 63 | + |
| 64 | +The `--keep-name-section` flag preserves function names for readable backtraces while still removing dead code. Omit it if you do not need stack traces. |
| 65 | + |
| 66 | +## Approach 2: the Wasm chunk store |
| 67 | + |
| 68 | +When compression alone is not enough, the Wasm chunk store lets you upload modules larger than 2 MiB by splitting them into chunks, then assembling and installing them in one atomic operation. |
| 69 | + |
| 70 | +### How the chunk store works |
| 71 | + |
| 72 | +1. **Upload chunks** — Call `upload_chunk` on the management canister to store up to 1 MiB chunks in the target canister's chunk store. Each call returns the SHA-256 hash of the stored chunk. |
| 73 | +2. **Assemble and install** — Call `install_chunked_code` with the ordered list of chunk hashes. The system concatenates the chunks, verifies the aggregate hash matches `wasm_module_hash`, and installs the result as if you had called `install_code` directly. |
| 74 | + |
| 75 | +The chunk store is bounded: each chunk is at most 1 MiB, and there is a maximum number of chunks per store (`CHUNK_STORE_SIZE`, defined in the IC interface spec — see the [management canister reference](../../reference/management-canister.md) for the exact value). You can inspect stored chunks with `stored_chunks` and clear the store with `clear_chunk_store`. |
| 76 | + |
| 77 | +### icp-cli handles this automatically |
| 78 | + |
| 79 | +When you run `icp deploy` or `icp canister install` with a Wasm module larger than 2 MiB, icp-cli automatically uses the chunk store — no configuration required. The tool splits the module, uploads each chunk, and calls `install_chunked_code` behind the scenes. <!-- TODO: verify automatic chunking behavior against icp-cli release notes --> |
| 80 | + |
| 81 | +```bash |
| 82 | +icp deploy |
| 83 | +``` |
| 84 | + |
| 85 | +### Combining compression with the chunk store |
| 86 | + |
| 87 | +You can combine gzip compression with the chunk store. A compressed module that is still larger than 2 MiB will still be split into chunks, but fewer chunks are needed — which means fewer upload calls and lower cycle costs. Enable both `shrink` and `compress` in your recipe, and let icp-cli decide whether chunking is needed. |
| 88 | + |
| 89 | +### Cycle costs |
| 90 | + |
| 91 | +Storing each chunk costs cycles proportional to 1 MiB of storage (even if the chunk is smaller). Chunks are temporary storage: they are consumed during `install_chunked_code` and do not accumulate after installation. If an installation attempt fails or is interrupted, call `clear_chunk_store` to reclaim the storage cycles before retrying. |
| 92 | + |
| 93 | +## Wasm64: 64-bit memory addressing |
| 94 | + |
| 95 | +Standard ICP canisters use the `wasm32-unknown-unknown` target, which limits addressable memory to 4 GiB. For canisters that need more — for example, those holding large in-memory datasets or running inference on large models — ICP supports the `wasm64-unknown-unknown` target with up to 6 GiB of addressable heap memory (an ICP platform limit). |
| 96 | + |
| 97 | +Wasm64 is a separate concern from the chunk store. You might use one, the other, or both: the chunk store addresses the 2 MiB upload limit, while Wasm64 addresses the runtime memory limit. |
| 98 | + |
| 99 | +### Building a Wasm64 canister |
| 100 | + |
| 101 | +Wasm64 requires the Rust nightly toolchain and the `build-std` unstable feature, because the standard library must be compiled for the `wasm64-unknown-unknown` target rather than pulled from a precompiled artifact. |
| 102 | + |
| 103 | +Create a `build.sh` script in your project directory: |
| 104 | + |
| 105 | +```bash |
| 106 | +#!/bin/bash |
| 107 | +
|
| 108 | +# Ensure nightly toolchain and rust-src are available |
| 109 | +rustup toolchain install nightly |
| 110 | +rustup component add rust-src --toolchain nightly |
| 111 | +
|
| 112 | +# Build for wasm64 |
| 113 | +cargo +nightly build \ |
| 114 | + -Z build-std=std,panic_abort \ |
| 115 | + --target wasm64-unknown-unknown \ |
| 116 | + --release \ |
| 117 | + -p backend |
| 118 | +
|
| 119 | +cp target/wasm64-unknown-unknown/release/backend.wasm target/backend.wasm |
| 120 | +candid-extractor target/backend.wasm > backend/backend.did |
| 121 | +``` |
| 122 | + |
| 123 | +Then reference the script in `icp.yaml`: |
| 124 | + |
| 125 | +```yaml |
| 126 | +canisters: |
| 127 | + - name: backend |
| 128 | + build: |
| 129 | + steps: |
| 130 | + - type: script |
| 131 | + commands: |
| 132 | + - ./build.sh |
| 133 | + - cp target/backend.wasm "$ICP_WASM_OUTPUT_PATH" |
| 134 | + - ic-wasm "$ICP_WASM_OUTPUT_PATH" -o "${ICP_WASM_OUTPUT_PATH}" metadata "candid:service" -f 'backend/backend.did' -v public --keep-name-section |
| 135 | +``` |
| 136 | + |
| 137 | +The canister code itself does not require changes — the same Rust CDK code works on both `wasm32` and `wasm64`: |
| 138 | + |
| 139 | +```rust |
| 140 | +#[ic_cdk::query] |
| 141 | +fn greet(name: String) -> String { |
| 142 | + format!("Hello, {}!", name) |
| 143 | +} |
| 144 | +
|
| 145 | +ic_cdk::export_candid!(); |
| 146 | +``` |
| 147 | + |
| 148 | +See the [backend_wasm64 example](https://github.com/dfinity/examples/tree/master/rust/backend_wasm64) for a complete working project. |
| 149 | + |
| 150 | +### Memory limits and Wasm64 |
| 151 | + |
| 152 | +Wasm64 canisters benefit from the `wasm_memory_limit` canister setting to cap WebAssembly heap usage, preventing runaway allocations: |
| 153 | + |
| 154 | +```yaml |
| 155 | +canisters: |
| 156 | + - name: backend |
| 157 | + build: |
| 158 | + steps: |
| 159 | + - type: script |
| 160 | + commands: |
| 161 | + - ./build.sh |
| 162 | + - cp target/backend.wasm "$ICP_WASM_OUTPUT_PATH" |
| 163 | + settings: |
| 164 | + wasm_memory_limit: 4gib |
| 165 | +``` |
| 166 | + |
| 167 | +## WebAssembly SIMD |
| 168 | + |
| 169 | +WebAssembly SIMD (Single Instruction, Multiple Data) is a set of more than 200 vector instructions defined in the WebAssembly core specification. SIMD allows a single instruction to operate on multiple data elements in parallel, which significantly accelerates compute-heavy workloads. |
| 170 | + |
| 171 | +SIMD is available on every ICP node and does not require any special canister configuration beyond enabling the target feature in your build. |
| 172 | + |
| 173 | +### When SIMD helps |
| 174 | + |
| 175 | +SIMD provides the largest gains for workloads with regular, data-parallel structure: |
| 176 | + |
| 177 | +- **AI/ML inference** — matrix multiplications, activation functions, convolutions |
| 178 | +- **Image processing** — pixel transforms, filtering, encoding/decoding |
| 179 | +- **Cryptographic operations** — hash computation, field arithmetic |
| 180 | +- **Scientific computing** — numerical simulations, signal processing |
| 181 | + |
| 182 | +For "classical" canister operations — reward distribution, token accounting, query logic — the gains are smaller but still measurable. |
| 183 | + |
| 184 | +### Loop auto-vectorization |
| 185 | + |
| 186 | +The simplest way to benefit from SIMD is to enable the `simd128` target feature and let the Rust compiler auto-vectorize loops. This is a one-line change that often provides significant speedup without rewriting any code. |
| 187 | + |
| 188 | +Enable SIMD globally for your entire workspace by creating `.cargo/config.toml`: |
| 189 | + |
| 190 | +```toml |
| 191 | +[build] |
| 192 | +target = ["wasm32-unknown-unknown"] |
| 193 | +
|
| 194 | +[target.wasm32-unknown-unknown] |
| 195 | +rustflags = ["-C", "target-feature=+simd128"] |
| 196 | +``` |
| 197 | + |
| 198 | +Or enable it only for a specific function: |
| 199 | + |
| 200 | +```rust |
| 201 | +#[target_feature(enable = "simd128")] |
| 202 | +#[ic_cdk::query] |
| 203 | +fn compute_heavy_operation() -> u64 { |
| 204 | + // The compiler auto-vectorizes eligible loops in this function |
| 205 | + // ... |
| 206 | + 0 |
| 207 | +} |
| 208 | +``` |
| 209 | + |
| 210 | +Auto-vectorization works best with tight numeric loops over contiguous arrays. The actual speedup depends on the algorithm, the compiler, and the input data. |
| 211 | + |
| 212 | +### SIMD intrinsics |
| 213 | + |
| 214 | +For maximum performance, you can use SIMD intrinsics directly. This gives full control over which vector instructions execute, at the cost of writing more complex code. |
| 215 | + |
| 216 | +The `wasm32` platform exposes SIMD intrinsics through the `core::arch::wasm32` module (available when `simd128` is enabled). For a complete working example comparing naive, optimized, auto-vectorized, and SIMD intrinsic implementations of matrix multiplication, see the [WebAssembly SIMD example](https://github.com/dfinity/examples/tree/master/rust/simd) in the examples repository. |
| 217 | + |
| 218 | +### Measuring SIMD performance |
| 219 | + |
| 220 | +Use the `ic0.performance_counter` system API to count Wasm instructions before and after a computation: |
| 221 | + |
| 222 | +```rust |
| 223 | +#[ic_cdk::query] |
| 224 | +fn benchmark_operation() -> u64 { |
| 225 | + let before = ic_cdk::api::instruction_counter(); |
| 226 | + // ... your computation ... |
| 227 | + ic_cdk::api::instruction_counter() - before |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +Compare instruction counts with and without SIMD to measure the speedup. Lower instruction counts mean lower cycle costs and faster execution. The [`canbench`](https://github.com/dfinity/canbench) framework provides a more structured benchmarking workflow for tracking performance over time. |
| 232 | + |
| 233 | +## Troubleshooting |
| 234 | + |
| 235 | +**"Wasm module too large" error during install** — The module exceeds 2 MiB. Verify that icp-cli is up to date (automatic chunk store support was added in v0.2.x). If using a manual install flow, switch to the `install_chunked_code` management canister API. |
| 236 | + |
| 237 | +**"Wasm chunk store error" during install** — The canister may lack sufficient cycles to store chunks (each 1 MiB chunk incurs a storage cost). Top up the canister's cycles balance before retrying. If chunks from a previous failed attempt are occupying the store, call `clear_chunk_store` first. |
| 238 | + |
| 239 | +**Wasm64 build fails with missing target** — The `nightly` toolchain and `rust-src` component must both be installed. Run: |
| 240 | + |
| 241 | +```bash |
| 242 | +rustup toolchain install nightly |
| 243 | +rustup component add rust-src --toolchain nightly |
| 244 | +``` |
| 245 | + |
| 246 | +**SIMD instructions have no measurable effect** — Some loops cannot be auto-vectorized. Check that the loop body is tight, operates on a contiguous slice, and does not contain branches or function calls that prevent vectorization. Profile with `ic_cdk::api::instruction_counter` to confirm the function is a bottleneck before investing in SIMD intrinsics. |
| 247 | + |
| 248 | +## Next steps |
| 249 | + |
| 250 | +- [Canister optimization](optimization.md) — reduce Wasm size before reaching for the chunk store |
| 251 | +- [Execution errors reference](../../reference/execution-errors.md) — Wasm size and chunk store error codes |
| 252 | +- [Canister lifecycle](lifecycle.md) — deployment modes and install options |
| 253 | + |
| 254 | +<!-- Upstream: informed by dfinity/portal docs/building-apps/developing-canisters/compile.mdx; dfinity/portal docs/building-apps/network-features/simd.mdx; dfinity/examples rust/backend_wasm64; dfinity/portal docs/references/ic-interface-spec.md --> |
0 commit comments