Skip to content

feat(test-fill): align fill-stateful with gas-benchmarks implementation#2923

Open
LouisTsai-Csie wants to merge 5 commits into
ethereum:forks/amsterdamfrom
LouisTsai-Csie:refactor-stateful-filling
Open

feat(test-fill): align fill-stateful with gas-benchmarks implementation#2923
LouisTsai-Csie wants to merge 5 commits into
ethereum:forks/amsterdamfrom
LouisTsai-Csie:refactor-stateful-filling

Conversation

@LouisTsai-Csie
Copy link
Copy Markdown
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented May 27, 2026

🗒️ Description

Add a few features & refactors to the stateful filling implementation.

Commit d198221: Replace debug_setHead with FCU approach.

debug_setHead only supports in Geth, Besu and Erigon. Nethermind implements debug_resetHead that takes block hash as input. Reth has the method registered but not implemented (Link), Nimbus comments out the method (Link), no implementation either.

Consider some features (e.g., built-in opcode tracing) is now only supported in Nethermind master branch, this PR changes the debug_setHead to FCU version, in order to set the canonical chain.

Commit 0252bec: Add a new CLI flag --gas-bump-blocks.

For perf-devnet-3 snapshot, we need to bump the block gas limit to ~1T block gas limit. This is done by sending empty blocks in gas-benchmarks. User could use --gas-bump-blocks to assign number of empty block to increase the block gas limit.

In gas-benchmarks, we send 5000 empty blocks to bump the block gas limit for perf-devnet-3. For Jochmnet, we do not need to bump the gas limit, as the block gas limit is already high enough (~1 gigagas).

Commit 05f3c16: Add deterministic sender pooling.

For test_ether_transfers_onchain_receivers benchmark, to prevent the sender account being cached, we pre-fund the account via CL withdrawal. We inject 15K withdrawal into funding phase, starting with this pk range:

SENDER_BASE_KEY = int.from_bytes(
    keccak256(b"gas-repricings-private-key"), "big"
)

Commit 54f43ad: Batch tx receipt request

Batch tx receipt RPC request, to avoid large amount of RPC request at the same time. This was a bottleneck in the past when using gas-benchmarks.

Commit 8477bfe: refactor fill_stateful.py

Split the hive setup version from the original fill_stateful.py file, to keep the file small and focus on the filler feature.

These senders are then consumed by test_ether_transfers_onchain_receivers, this aligns changes in gas-benchmarks PR NethermindEth/gas-benchmarks#146

Verification

Verify the test via Hive mode, works successfully.

🔗 Related Issues or PRs

Issue #2910

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@LouisTsai-Csie LouisTsai-Csie self-assigned this May 27, 2026
@LouisTsai-Csie LouisTsai-Csie added A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler C-feat Category: an improvement or new feature labels May 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.43%. Comparing base (1596cf6) to head (8477bfe).
⚠️ Report is 4 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2923      +/-   ##
===================================================
- Coverage            90.44%   90.43%   -0.01%     
===================================================
  Files                  535      535              
  Lines                32439    32430       -9     
  Branches              3012     3012              
===================================================
- Hits                 29338    29329       -9     
  Misses                2573     2573              
  Partials               528      528              
Flag Coverage Δ
unittests 90.43% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LouisTsai-Csie LouisTsai-Csie requested a review from marioevz May 27, 2026 13:30
Copy link
Copy Markdown
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, I added a couple of comments which I feel need to be resolved before merging.

Comment on lines +391 to +394
head_block = self.get_block_by_hash(head_block_hash)
assert head_block is not None, (
f"cannot reset head to unknown block {head_block_hash}"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is unnecessary, since the client itself will make the same complaint if it cannot find the block.
It's just one less RPC call to make IMO.

Comment on lines +395 to +398
head_fork = self.fork.fork_at(
block_number=HexNumber(head_block["number"]),
timestamp=HexNumber(head_block["timestamp"]),
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now the reason behind the previous call, but perhaps the solution is that we should never expect TransitionFork here and assume that ChainBuilderEthRPC is always going to be initialized with Fork, otherwise error out.

Comment on lines +399 to +402
fcu_version = head_fork.engine_forkchoice_updated_version()
assert fcu_version is not None, (
"Fork does not support engine forkchoice_updated"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this once during initialization of ChainBuilderEthRPC, save fcu_version as a field of the chain builder instance, and then just use it as self.fcu_version.
This is also under the assumption that fork is always Fork and never TransitionFork.

Comment on lines 557 to 558
@pytest.fixture(autouse=True, scope="function")
def _reset_chain_between_tests(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe out of scope for this PR, but the way this fixture is auto-used feels off.
I think a better structure would be to have a session_client_backend fixture that is session-scoped, and a client_backend fixture that is function-scoped.

The session_client_backend fixture initializes the object for the entire session, most of what _session_pre_run and client_backend fixtures do now.

The new client_backend fixture prepares and cleans up the ClientBackend object before and after every test, so what _reset_chain_between_tests currently does.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See example of session_t8n and t8n session and function scoped fixtures that use the same object internally:

return
try:
debug_rpc.set_head(start_hex)
eth_rpc.set_canonical_head(Hash(expected_hash))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call feels like we are conflating two different objects to achieve a purpose.

We could have client_backend.reset_to_start_block or similar because client_backend already knows start_block so it can do everything by itself.

This also makes the code more maintainable because if we ever modify the client_backend structure, you only have to modify methods inside of the self-contained ClientBackend class.

Comment on lines +117 to +128
group.addoption(
"--gas-bump-blocks",
action="store",
dest="gas_bump_blocks",
default=5000,
type=int,
help=(
"Empty blocks at start to increase block gas limit."
"Each block increase ≤ parent_gas_limit/1024."
"Default: 5000 blocks."
),
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right idea but the wrong abstraction.

A user normally only knows the gas limit their test needs, not how many empty blocks produce it. With this parameter they still have to (a) find the starting block gas limit of the snapshot, and (b) compute the number of blocks needed to ramp up to their target. That's our math to do, not theirs.

Two suggestions:

  • For the MVP, make the parameter --block-gas-limit (the target), and internally derive how many empty blocks are needed to reach it from the start block's gas limit. The ≤ parent_gas_limit/1024 per-block constraint is exactly the kind of detail we should be hiding behind this.
  • Longer term, and probably out of scope here: even that parameter may be unnecessary. We already have a default Environment with a default gas_limit, so we could just target that. A test that deviates inherently needs a different setup, and therefore a different number of gas-bump-blocks anyway.

# Placed outside pre-allocation to ensure accounts remain uncached.
SENDER_BASE_KEY = (
0x1111111111111111111111111111111111111111111111111111111111111111
SENDER_BASE_KEY = int.from_bytes(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that this sidesteps the standard pre.fund_eoa path. That abstraction exists so we can change how funding works under the hood without touching tests — and this is the one test that opts out, which means every framework change now has to account for this special case separately. Case in point: This PR adds a whole new parameter just to deal with this (--sender-pool-size).

It also makes the test order-dependent: yield_distinct_sender() hands out senders based on execution order, so the accounts a given test gets aren't stable.

I'd rather we solve the underlying thing generally. If the goal is funding via withdrawals to keep accounts uncached, that's something pre.fund_eoa could do for all accounts — at which point this becomes automatic and the special case (and the new parameter) goes away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler C-feat Category: an improvement or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants