Skip to content

JAXBench: Fix no-op exploit in 19 KernelBench baselines: zero-init weights -> random#50

Open
charleshong3 wants to merge 1 commit into
AI-Hypercomputer:mainfrom
charleshong3:fix-kernelbench-noop-zeros-weights
Open

JAXBench: Fix no-op exploit in 19 KernelBench baselines: zero-init weights -> random#50
charleshong3 wants to merge 1 commit into
AI-Hypercomputer:mainfrom
charleshong3:fix-kernelbench-noop-zeros-weights

Conversation

@charleshong3

Copy link
Copy Markdown

18k-50k KernelBench-derived baselines initialized their weight/bias tensors to jnp.zeros in create_inputs. With zero weights, x @ W (+ b) is identically zero and independent of the input, so the reference output is a trivial constant (all-zero, or a fixed activation thereof). Any kernel returning that constant -- including a no-op that skips the matmul/conv entirely -- passes np.allclose, so these benchmarks could report large meaningless speedups without computing the operator.

This replaces the zero-init weights/biases with small-normal random values (~0.02 scale: input-dependent, bf16-representable, no overflow). Only create_inputs is changed; the workload/op is untouched. After the fix a no-op (all-zero output) fails correctness on all 19.

Scope: 19 of the affected baselines are fully fixed by non-zero weights. Five others whose output is intrinsically small regardless of weights -- the softmax-terminated 38k/43k/50k (row outputs ~1/N) and the structurally degenerate 25k (GroupNorm->Mean) and 42k (Max-Subtract-GELU) -- are NOT addressed here; they need a tolerance or operator change and are left to a follow-up. Megablox (11p) has a distinct input-underflow variant fixed separately.

…andom

18k-50k KernelBench-derived baselines initialized their weight/bias tensors to
jnp.zeros in create_inputs. With zero weights, `x @ W (+ b)` is identically
zero and independent of the input, so the reference output is a trivial
constant (all-zero, or a fixed activation thereof). Any kernel returning that
constant -- including a no-op that skips the matmul/conv entirely -- passes
np.allclose, so these benchmarks could report large meaningless speedups
without computing the operator.

This replaces the zero-init weights/biases with small-normal random values
(~0.02 scale: input-dependent, bf16-representable, no overflow). Only
create_inputs is changed; the workload/op is untouched. After the fix a no-op
(all-zero output) fails correctness on all 19.

Scope: 19 of the affected baselines are fully fixed by non-zero weights. Five
others whose *output* is intrinsically small regardless of weights -- the
softmax-terminated 38k/43k/50k (row outputs ~1/N) and the structurally
degenerate 25k (GroupNorm->Mean) and 42k (Max-Subtract-GELU) -- are NOT
addressed here; they need a tolerance or operator change and are left to a
follow-up. Megablox (11p) has a distinct input-underflow variant fixed
separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@google-cla

google-cla Bot commented Jun 10, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant