Skip to content

fix(op-service/eth): stabilize blob fuzz targets#20940

Merged
ajsutton merged 5 commits into
developfrom
aj/fix/flake-fuzz-encode-decode-blob
May 22, 2026
Merged

fix(op-service/eth): stabilize blob fuzz targets#20940
ajsutton merged 5 commits into
developfrom
aj/fix/flake-fuzz-encode-decode-blob

Conversation

@ajsutton
Copy link
Copy Markdown
Contributor

@ajsutton ajsutton commented May 20, 2026

Claude: Authored by Claude on Adrian's behalf.

Two correctness fixes in the blob fuzz path that surfaced while chasing a CI flake:

  • blob.go: format len(data) (%d) instead of the full payload (%v) in the ErrBlobInputTooLarge error. The previous form produced a ~260 KiB hex string per oversize input.
  • FuzzEncodeDecodeBlob: truncate d to MaxBlobDataSize so every input exercises the round-trip.
  • FuzzDetectNonBijectivity: derive the bit to flip from sha256(d) instead of a shared *rand.Rand — Go fuzz requires deterministic targets.

The original CI flake was on fuzz-golang-op-service running at medium.gen2 (2 vCPU) via the W4 pilot in #20899; the runner has since been resized back to xlarge.gen2 in #20931. This PR is rebased onto develop to pick up that resize, and verifies the remaining test changes pass on the proper runner.

Closes #20935
Closes #20936

@ajsutton ajsutton force-pushed the aj/fix/flake-fuzz-encode-decode-blob branch 3 times, most recently from ea637f0 to f1a0edb Compare May 20, 2026 23:58
@ajsutton ajsutton marked this pull request as ready for review May 21, 2026 00:00
@ajsutton ajsutton requested a review from a team as a code owner May 21, 2026 00:00
@ajsutton ajsutton changed the title fix(op-service/eth): stabilize FuzzEncodeDecodeBlob fix(op-service/eth): stabilize FuzzEncodeDecodeBlob and FuzzDetectNonBijectivity May 21, 2026
@ajsutton ajsutton changed the title fix(op-service/eth): stabilize FuzzEncodeDecodeBlob and FuzzDetectNonBijectivity DIAGNOSTIC: investigate fuzz-golang-op-service EOF after Gen2 migration May 21, 2026
@ajsutton ajsutton force-pushed the aj/fix/flake-fuzz-encode-decode-blob branch from 982f703 to d99c8e5 Compare May 21, 2026 01:13
@ajsutton ajsutton changed the title DIAGNOSTIC: investigate fuzz-golang-op-service EOF after Gen2 migration fix(op-service): cap fuzz GOMAXPROCS=8 and stabilize blob fuzz targets May 21, 2026
ajsutton added 4 commits May 21, 2026 12:49
Blob.FromData formatted the entire input payload into the
ErrBlobInputTooLarge error message via %v on eth.Data (hexutil.Bytes).
For a single oversize input that produced a ~260KiB error string. Inside
FuzzEncodeDecodeBlob the oversize input flowed straight into
require.NoError, so each oversize execution allocated and formatted that
260KiB message; under coverage-instrumented minimization that work
multiplies across iterations until the harness reports
"fuzzing process hung or terminated unexpectedly while minimizing: EOF"
and throughput collapses to 0 execs/sec for the remainder of fuzztime.

Two minimal changes:

1. Format len(data) instead of data in the error in FromData. Add a
   regression assertion in TestTooLongDataEncoding that the error
   message stays under 1KiB.
2. In FuzzEncodeDecodeBlob, treat ErrBlobInputTooLarge as documented
   out-of-scope and return early, instead of asserting NoError. This
   removes the formatting amplification path entirely and preserves
   full input-size coverage (no caps).

The flake is intermittent because the libFuzzer mutator only
sporadically produces inputs above MaxBlobDataSize; runs that never hit
the oversize branch never trigger the stall.

Refs #20935
The fuzz function captured a shared *rand.Rand outside the f.Fuzz closure
and used it to pick which bit to flip on each iteration. Go's fuzz engine
requires fuzz targets to be deterministic for the same input — it re-runs
inputs to verify reproducibility, to gather baseline coverage, and during
minimization. A shared rand source meant the same input produced different
behavior on re-execution, which causes worker subprocesses to stall during
minimization and ultimately exit with EOF (#20936).

Derive the bit to flip from a SHA-256 of the input instead, so the target
is a pure function of its input. The shared Blob buffer outside the closure
is fine — within one worker subprocess the fuzz callback runs sequentially.
CircleCI Gen2 docker images expose the full host core count to Go's
runtime, so the fuzz harness was spawning 32 workers on an xlarge.gen2
runner (8 vCPU). The resulting 4x CPU oversubscription collapsed
throughput from ~7000 execs/sec/worker to <10 and broke the Go fuzz
coordinator-worker liveness heartbeats, surfacing as

  fuzzing process hung or terminated unexpectedly while minimizing: EOF

at fuzztime. Pinning GOMAXPROCS=8 restores stable throughput
(~450-550 execs/sec total over 8 workers, full 60s) and clears the
flake.
@ajsutton ajsutton force-pushed the aj/fix/flake-fuzz-encode-decode-blob branch from d99c8e5 to 1dea76e Compare May 21, 2026 02:51
@ajsutton ajsutton changed the title fix(op-service): cap fuzz GOMAXPROCS=8 and stabilize blob fuzz targets fix(op-service/eth): stabilize blob fuzz targets May 21, 2026
Copy link
Copy Markdown
Contributor

@hdcesario-op hdcesario-op left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting one targeted fix in the remaining blob fuzz target.

Comment thread op-service/eth/blob_test.go
Address review feedback on FuzzDetectNonBijectivity: truncate input to
MaxBlobDataSize before hashing so bit selection is derived from the
encoded data, and check NotEqual only on successful decode.
Copy link
Copy Markdown
Contributor

@hdcesario-op hdcesario-op left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hdcesario-op hdcesario-op enabled auto-merge May 21, 2026 15:18
@hdcesario-op hdcesario-op disabled auto-merge May 21, 2026 15:18
@hdcesario-op hdcesario-op added this pull request to the merge queue May 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 21, 2026
@ajsutton ajsutton added this pull request to the merge queue May 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 21, 2026
@ajsutton ajsutton added this pull request to the merge queue May 22, 2026
Merged via the queue into develop with commit dac3a11 May 22, 2026
185 checks passed
@ajsutton ajsutton deleted the aj/fix/flake-fuzz-encode-decode-blob branch May 22, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: FuzzDetectNonBijectivity (op-service/eth) Flaky test: FuzzEncodeDecodeBlob (op-service/eth)

2 participants