Skip to content

feat(e2e): benchmark mempool and memiavl#481

Open
traviolus wants to merge 7 commits intomainfrom
feat/e2e-benchmark
Open

feat(e2e): benchmark mempool and memiavl#481
traviolus wants to merge 7 commits intomainfrom
feat/e2e-benchmark

Conversation

@traviolus
Copy link
Contributor

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

@traviolus traviolus requested a review from a team as a code owner February 25, 2026 09:47
@github-actions
Copy link

github-actions bot commented Feb 25, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an end-to-end benchmarking suite: Makefile target, README, benchmark tests, config, load generators, collectors, reporters, and cluster API extensions to capture mempool, block, throughput, and latency metrics across baseline, mempool-only, and combined variants.

Changes

Cohort / File(s) Summary
Build Infrastructure
Makefile
Added benchmark-e2e target and included it in the .PHONY list.
Benchmark README
integration-tests/e2e/benchmark/README.md
New comprehensive benchmark guide describing topology, metrics, run instructions, workflows, and JSON output schema.
Benchmark Tests
integration-tests/e2e/benchmark/benchmark_test.go
New e2e benchmark tests (multiple scenarios: baseline, comparisons, mempool, move exec, queue promotion, gossip) gated by benchmark build tag.
Benchmark Configuration
integration-tests/e2e/benchmark/config.go
Added Variant type, BenchConfig struct, defaults, factory helpers, GetGasLimit and TotalTx helpers.
Load Generation
integration-tests/e2e/benchmark/load.go
Added load generators and helpers: BurstLoad, SequentialLoad, OutOfOrderLoad, SingleNodeLoad, MoveExecBurstLoad, Warmup, and types TxSubmission, LoadResult.
Collection & Analysis
integration-tests/e2e/benchmark/collector.go
Added MempoolPoller, CollectResults, WaitForAllIncluded, CollectInitialMetas, BenchResult, BlockStat, and latency/TPS aggregation utilities.
Reporting
integration-tests/e2e/benchmark/report.go
Added result persistence (WriteResult), loaders (LoadResults, LoadBaselineResults*), and comparison printers (PrintComparisonTable, PrintImprovementTable) with delta formatting.
Cluster API & Types
integration-tests/e2e/cluster.go, integration-tests/e2e/cluster/cluster.go
Re-exported BlockResult; added ClusterOptions.MemIAVL; new Cluster methods NodeRPCPort, LatestHeight, UnconfirmedTxCount, QueryBlock (decodes block, base64-decodes txs, hashes), and SendMoveExecuteJSONWithGas.

Sequence Diagram

sequenceDiagram
    participant Test as Test
    participant Cluster as Cluster
    participant Load as LoadGenerator
    participant Poller as MempoolPoller
    participant Collector as Collector
    participant Reporter as Reporter

    Test->>Cluster: Setup cluster (BenchConfig, MemIAVL)
    Test->>Cluster: Collect initial account metas
    Test->>Test: Warmup transactions
    Test->>Cluster: Refresh metas, record start height
    Test->>Poller: Start mempool polling
    Test->>Load: Execute load (Burst/Sequential/OutOfOrder)
    Load->>Cluster: Submit transactions
    Load-->>Test: Return LoadResult
    Test->>Cluster: WaitForAllIncluded (drain + finality)
    Test->>Poller: Stop polling (get peak)
    Test->>Collector: CollectResults (query blocks, match submissions)
    Collector->>Cluster: QueryBlock (fetch & decode txs, timestamps)
    Collector-->>Test: Return BenchResult (TPS, latencies, BlockStats)
    Test->>Reporter: WriteResult / PrintComparisonTable
    Reporter-->>Test: Persist JSON & logs
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped to start the mempool race,
I chased each nonce and every trace,
I counted TPS and latencies small,
I peeked at peaks and logged them all,
A tiny rabbit saved results in place.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding end-to-end benchmarks for mempool and MemIAVL functionality.
Description check ✅ Passed The description uses the standard template but lacks specific details about the changes; however, it is not unrelated to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/e2e-benchmark

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (5)
Makefile (1)

247-249: Parameterize the benchmark test filter and keep module mode read-only.

This target is useful as-is, but adding BENCH_RUN and -mod=readonly makes it safer and easier to run focused suites in CI/dev workflows.

♻️ Proposed improvement
+BENCH_RUN ?= TestBenchmark
+
 benchmark-e2e:
-	cd integration-tests/e2e && go test -v -tags benchmark -run TestBenchmark -timeout 30m -count=1 ./benchmark/
+	cd integration-tests/e2e && go test -mod=readonly -v -tags benchmark -run '$(BENCH_RUN)' -timeout 30m -count=1 ./benchmark/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 247 - 249, Update the benchmark-e2e Makefile target to
accept a BENCH_RUN filter and run Go in module readonly mode: change the command
invoked by the benchmark-e2e target (currently invoking go test -v -tags
benchmark -run TestBenchmark -timeout 30m -count=1 ./benchmark/) to use the
BENCH_RUN variable (e.g., -run "$(BENCH_RUN:-TestBenchmark)") and add the
-mod=readonly flag so the go test invocation becomes go test -mod=readonly -v
-tags benchmark -run "$(BENCH_RUN)" -timeout 30m -count=1 ./benchmark/; ensure
the target name benchmark-e2e and variable BENCH_RUN are used exactly as
referenced.
integration-tests/e2e/benchmark/collector.go (1)

199-199: Make the finality wait context-aware.

This fixed sleep ignores cancellation and can keep tests waiting after timeout/cancel paths.

♻️ Proposed improvement
-	// Wait one more block to ensure finality
-	time.Sleep(3 * time.Second)
+	// Wait one more block to ensure finality
+	select {
+	case <-ctx.Done():
+		return 0, ctx.Err()
+	case <-time.After(3 * time.Second):
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` at line 199, Replace the fixed
sleep with a context-aware wait: instead of time.Sleep(3 * time.Second) use a
select that waits on time.After(3*time.Second) and ctx.Done() (where ctx is the
request/test context in scope) so the routine aborts promptly when cancelled;
ensure the surrounding function (e.g., the finality wait routine) returns or
propagates ctx.Err() on cancellation rather than blocking until the sleep
completes.
integration-tests/e2e/cluster/cluster.go (1)

294-303: Add a request-level timeout for block RPC calls.

With benchmark flows using long-lived contexts, a stalled RPC can block result collection indefinitely.

♻️ Proposed improvement
 	url := fmt.Sprintf("http://127.0.0.1:%d/block?height=%d", n.Ports.RPC, height)
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	reqCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
+	defer cancel()
+	req, err := http.NewRequestWithContext(reqCtx, http.MethodGet, url, nil)
 	if err != nil {
 		return BlockResult{}, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 294 - 303, The block
RPC call currently uses the long-lived ctx and can hang; wrap the request in a
short request-level timeout by creating a derived context, e.g. ctxWithTimeout,
cancel := context.WithTimeout(ctx, rpcRequestTimeout) (choose a sensible
duration constant like rpcRequestTimeout), defer cancel(), and use
ctxWithTimeout in http.NewRequestWithContext instead of ctx; ensure you
reference url, req creation, and the http.DefaultClient.Do(req) call so the
request will fail fast on timeout rather than blocking result collection.
integration-tests/e2e/benchmark/report.go (1)

18-20: Avoid overwriting previous benchmark runs with identical labels.

Using label-only filenames drops historical runs and makes trend comparisons harder.

♻️ Proposed improvement
 import (
 	"encoding/json"
 	"fmt"
 	"os"
 	"path/filepath"
 	"strings"
 	"testing"
+	"time"
 )
@@
 	safe := strings.NewReplacer("/", "_", " ", "_", "+", "_").Replace(result.Config.Label)
-	path := filepath.Join(dir, safe+".json")
+	stamp := time.Now().UTC().Format("20060102T150405Z")
+	path := filepath.Join(dir, fmt.Sprintf("%s_%s.json", safe, stamp))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 18 - 20, The current
filename uses only result.Config.Label (via safe :=
strings.NewReplacer(...).Replace(result.Config.Label) and path :=
filepath.Join(dir, safe+".json")), which overwrites previous runs with the same
label; change the filename to include a unique suffix (e.g., a timestamp from
time.Now().UTC().Format with seconds or milliseconds, or a run ID/UUID) so each
run produces a distinct file (e.g., combine safe + "_" + timestamp + ".json")
and update the code that reads/writes using the new path variable accordingly.
integration-tests/e2e/benchmark/load.go (1)

185-197: Duplicate slice initialisation in sequencePattern.

The []uint64{base+2, base, base+1} literal is initialised twice — once for the count <= 3 branch and once before the count > 3 loop. Unifying them removes the duplication.

♻️ Proposed refactor
 func sequencePattern(base uint64, count int) []uint64 {
-	if count <= 3 {
-		seqs := []uint64{base + 2, base, base + 1}
-		return seqs[:count]
-	}
-
 	seqs := []uint64{base + 2, base, base + 1}
+	if count <= 3 {
+		return seqs[:count]
+	}
 	for i := 3; i < count; i++ {
 		seqs = append(seqs, base+uint64(i))
 	}
-
 	return seqs
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 185 - 197, The function
sequencePattern duplicates the literal []uint64{base + 2, base, base + 1} in
both branches; to fix, initialize seqs once (seqs := []uint64{base + 2, base,
base + 1}), then if count <= 3 return seqs[:count], otherwise run the existing
for loop (for i := 3; i < count; i++ { seqs = append(seqs, base+uint64(i)) })
and return seqs; this removes the duplicated initialization while preserving
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 253-256: The benchmark creates a MempoolPoller per node but each
MempoolPoller already computes the max across all nodes, causing duplicated
metrics and the saved result uses a hardcoded peak_mempool_size=0; fix by
instantiating a single MempoolPoller via NewMempoolPoller(ctx, cluster,
mempoolPollInterval) (reuse that one for the whole test instead of creating one
per node), and when persisting results read the actual max value from the poller
(e.g., poller.Max or the appropriate accessor on MempoolPoller) into
peak_mempool_size rather than 0; apply the same change for the analogous block
around the other occurrence (lines 266-271).

In `@integration-tests/e2e/benchmark/collector.go`:
- Around line 157-164: The throughput window currently uses the first and last
entries of blockStats which includes empty blocks; instead scan blockStats to
find the first and last entries that actually contain benchmark transactions
(e.g., where a transaction count field such as TxCount > 0 or a Txs slice length
> 0), assign their Time values to first/last, compute blockDuration =
last.Sub(first).Seconds(), and only set totalDuration when blockDuration > 0;
update the code around blockStats, first, last, blockDuration and totalDuration
accordingly so the TPS window excludes empty blocks.

In `@integration-tests/e2e/benchmark/load.go`:
- Line 48: Check and guard cfg.NodeCount before performing the modulo in both
BurstLoad and OutOfOrderLoad: if cfg.NodeCount is zero (or otherwise invalid)
return an error or set a safe default before entering the loop so the expression
i % cfg.NodeCount cannot panic; update the entry checks in BurstLoad and add the
same pre-loop guard to OutOfOrderLoad (the code computing viaNode using i %
cfg.NodeCount) to ensure safe execution.
- Around line 13-19: TxSubmission currently lacks an application-level response
code and non-zero res.Code checks are ignored; add a Code int (or uint32) field
to the TxSubmission struct and populate it wherever responses are processed
(e.g., in BurstLoad where res.Code is checked) so callers can distinguish
success vs app-level failures; update the response handling logic in BurstLoad
to assign res.Code into the TxSubmission entries and apply the same change to
OutOfOrderLoad and SingleNodeLoad so all load paths record the code
consistently.
- Line 135: SingleNodeLoad currently takes targetNode int and passes it straight
to SendBankTxWithSequence without bounds checking; add an entry validation in
SingleNodeLoad to ensure 0 <= targetNode < cfg.NodeCount and handle invalid
values by returning an early LoadResult/error (or logging and aborting the load)
instead of calling SendBankTxWithSequence; apply the same validation to the
other similar usage site referenced in the diff so no out-of-range targetNode
can reach SendBankTxWithSequence and corrupt the benchmark.
- Around line 44-73: The per-account goroutine loops in BurstLoad (and similarly
in OutOfOrderLoad and SingleNodeLoad) do not observe ctx cancellation and will
continue dispatching txs after the context is done; update each loop that
iterates up to cfg.TxPerAccount (the goroutine that calls
cluster.SendBankTxWithSequence and appends to result.Submissions) to check
ctx.Done() at the top of each iteration (e.g., use a non-blocking select on
ctx.Done() or check ctx.Err()) and break/return if cancelled so the goroutine
stops sending work and allows wg.Wait() to complete promptly.
- Around line 185-197: sequencePattern currently emits gaps when count < 3 (e.g.
returns [base+2] for count==1), which leaves some sequence numbers never
submitted and transactions stuck; fix by changing sequencePattern to return a
contiguous sequence for small counts (for count <= 3 or specifically count < 3)
— i.e., when count < 3 return []uint64{base, base+1, ...} truncated to count,
otherwise keep the existing out-of-order logic; alternatively (or in addition)
validate TxPerAccount in OutOfOrderLoad and enforce a minimum of 3 before
calling sequencePattern to avoid calling it with invalid counts. Ensure you
update the function sequencePattern and/or the caller
OutOfOrderLoad/TxPerAccount validation accordingly.
- Around line 200-212: The Warmup function currently ignores the result of
SendBankTxWithSequence and doesn't update the metas map, which hides
transport/app errors and leaves stale on-chain sequence numbers; modify Warmup
to capture and check the return/error from cluster.SendBankTxWithSequence, log
any failures (including context like account name and node), and on success
increment the corresponding metas[name].Sequence (or refresh metas[name] from
chain) so subsequent calls use the updated sequence; also add a brief comment in
Warmup documenting the sequence-side effect and that callers must use the
updated metas or re-fetch account metadata.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 87-92: Update the README to use one consistent default results
directory everywhere: replace occurrences of `results/` with
`benchmark/results/` (or vice versa based on project convention) so all
references (e.g., the "Results land in `benchmark/results/` as JSON." line and
the "After baseline results exist in `results/`:" line) point to the same path,
and run a quick grep to ensure no remaining mismatched occurrences remain in the
document.
- Around line 107-112: Add language identifiers (e.g., "text") to the fenced
code blocks in the README so Markdown lint (MD040) stops warning: update each
triple-backtick block that contains the table or code samples (the backtick
fences around the tabular benchmark output and the code snippets under the
"benchmark/" section) to start with ```text instead of ```; ensure all
occurrences noted in the comment (the table blocks and the code listing blocks)
are changed consistently.
- Around line 79-81: The E2E_INITIAD_BIN environment variable is currently
applied only to the cd command so go test may not see it; change how the env var
is set so go test receives it (either prefix the go test invocation with
E2E_INITIAD_BIN=./build/initiad or export E2E_INITIAD_BIN before running cd and
go test) to ensure the TestBenchmarkBaseline run uses the intended binary
(reference the E2E_INITIAD_BIN variable and the go test -run
TestBenchmarkBaseline invocation).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 332-336: The loop over decoded.Result.Block.Data.Txs currently
ignores base64 decode failures (using continue) which silently under-reports
txs; replace the silent continue with a fail-fast error return that includes
block context and tx index: when base64Decode(txBase64) returns decErr,
construct and return an error (or propagate decErr) that includes
decoded.Result.Block.Header.Height (or block identifier from decoded), the tx
index within the loop, and the original decErr so callers can surface the
failure and metrics remain correct; update any surrounding function signature to
return an error if needed (identify code at the loop over
decoded.Result.Block.Data.Txs and the base64Decode call).

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/collector.go`:
- Line 199: Replace the fixed sleep with a context-aware wait: instead of
time.Sleep(3 * time.Second) use a select that waits on time.After(3*time.Second)
and ctx.Done() (where ctx is the request/test context in scope) so the routine
aborts promptly when cancelled; ensure the surrounding function (e.g., the
finality wait routine) returns or propagates ctx.Err() on cancellation rather
than blocking until the sleep completes.

In `@integration-tests/e2e/benchmark/load.go`:
- Around line 185-197: The function sequencePattern duplicates the literal
[]uint64{base + 2, base, base + 1} in both branches; to fix, initialize seqs
once (seqs := []uint64{base + 2, base, base + 1}), then if count <= 3 return
seqs[:count], otherwise run the existing for loop (for i := 3; i < count; i++ {
seqs = append(seqs, base+uint64(i)) }) and return seqs; this removes the
duplicated initialization while preserving behavior.

In `@integration-tests/e2e/benchmark/report.go`:
- Around line 18-20: The current filename uses only result.Config.Label (via
safe := strings.NewReplacer(...).Replace(result.Config.Label) and path :=
filepath.Join(dir, safe+".json")), which overwrites previous runs with the same
label; change the filename to include a unique suffix (e.g., a timestamp from
time.Now().UTC().Format with seconds or milliseconds, or a run ID/UUID) so each
run produces a distinct file (e.g., combine safe + "_" + timestamp + ".json")
and update the code that reads/writes using the new path variable accordingly.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 294-303: The block RPC call currently uses the long-lived ctx and
can hang; wrap the request in a short request-level timeout by creating a
derived context, e.g. ctxWithTimeout, cancel := context.WithTimeout(ctx,
rpcRequestTimeout) (choose a sensible duration constant like rpcRequestTimeout),
defer cancel(), and use ctxWithTimeout in http.NewRequestWithContext instead of
ctx; ensure you reference url, req creation, and the http.DefaultClient.Do(req)
call so the request will fail fast on timeout rather than blocking result
collection.

In `@Makefile`:
- Around line 247-249: Update the benchmark-e2e Makefile target to accept a
BENCH_RUN filter and run Go in module readonly mode: change the command invoked
by the benchmark-e2e target (currently invoking go test -v -tags benchmark -run
TestBenchmark -timeout 30m -count=1 ./benchmark/) to use the BENCH_RUN variable
(e.g., -run "$(BENCH_RUN:-TestBenchmark)") and add the -mod=readonly flag so the
go test invocation becomes go test -mod=readonly -v -tags benchmark -run
"$(BENCH_RUN)" -timeout 30m -count=1 ./benchmark/; ensure the target name
benchmark-e2e and variable BENCH_RUN are used exactly as referenced.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 82be464 and de2baaf.

📒 Files selected for processing (9)
  • Makefile
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/benchmark/config.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster.go
  • integration-tests/e2e/cluster/cluster.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
integration-tests/e2e/benchmark/README.md (2)

37-38: ⚠️ Potential issue | 🟡 Minor

Use one consistent default results directory path throughout the doc.

results/ and benchmark/results/ are both documented as defaults, which is ambiguous for users and CI scripts.

✏️ Suggested cleanup
-- The JSON results persist in `results/`. Subsequent `TestBenchmarkFullComparison` runs load them automatically.
+- The JSON results persist in `benchmark/results/`. Subsequent `TestBenchmarkFullComparison` runs load them automatically.
@@
-| `BENCHMARK_RESULTS_DIR` | `results/` | Output directory for JSON results |
+| `BENCHMARK_RESULTS_DIR` | `benchmark/results/` | Output directory for JSON results |

Also applies to: 87-92, 189-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 37 - 38, The README
documents two different default results directories ("results/" and
"benchmark/results/"), causing ambiguity for users and CI; pick one canonical
default path and update all mentions to it (including the sentence referencing
TestBenchmarkFullComparison and the other occurrences noted around lines 87-92
and 189) so the doc consistently uses the same directory throughout; ensure any
examples, scripts, and the TestBenchmarkFullComparison description reflect that
single chosen path.

107-112: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced blocks to satisfy MD040.

These blocks should use ```text (or another explicit language) to clear markdownlint warnings.

✏️ Suggested cleanup
-```
+```text
 Config                    | Variant      |   TPS | vs base |  P50ms | vs base |  P95ms | vs base |  P99ms | vs base | Peak Mempool
 ...

- +text
Config | Variant | TPS | P50ms | P95ms | P99ms | Max ms | Peak Mempool
...


-```
+```text
benchmark/
  config.go          Variant definitions, BenchConfig, preset constructors
...
</details>


Also applies to: 130-134, 155-163

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @integration-tests/e2e/benchmark/README.md around lines 107 - 112, The
markdown fenced code blocks triggering MD040 should include an explicit language
identifier (e.g., use text) instead of bare ; update the shown table block
and the other fenced blocks (the blocks around the snippet at lines 130-134 and
155-163) by changing their opening fences from totext (or another
appropriate language) so markdownlint MD040 no longer warns and the content
renders correctly.


</details>

</blockquote></details>
<details>
<summary>integration-tests/e2e/benchmark/collector.go (1)</summary><blockquote>

`131-137`: _⚠️ Potential issue_ | _🟠 Major_

**TPS window is still derived from block non-emptiness, not benchmark tx inclusion.**

At Line 133/160, `TxCount` tracks all txs in the block, so duration can still include blocks with no benchmark txs. The window should use first/last block times where matched benchmark txs were actually found.

 

<details>
<summary>🐛 Proposed fix</summary>

```diff
 	var (
 		blockStats []BlockStat
 		latencies  []float64
 		included   int
+		firstIncludedTime time.Time
+		lastIncludedTime  time.Time
 	)
@@
-		for _, txHash := range block.TxHashes {
+		includedInBlock := 0
+		for _, txHash := range block.TxHashes {
 			sub, ok := submitMap[txHash]
 			if !ok {
 				continue
 			}
 			included++
+			includedInBlock++
 			latencyMs := float64(block.BlockTime.Sub(sub.SubmitTime).Milliseconds())
 			if latencyMs < 0 {
 				latencyMs = 0
 			}
 			latencies = append(latencies, latencyMs)
 		}
+		if includedInBlock > 0 {
+			if firstIncludedTime.IsZero() {
+				firstIncludedTime = block.BlockTime
+			}
+			lastIncludedTime = block.BlockTime
+		}
 	}
@@
-	// use span from first to last block that actually included benchmark txs
-	var firstTxTime, lastTxTime time.Time
-	for _, bs := range blockStats {
-		if bs.TxCount == 0 {
-			continue
-		}
-		if firstTxTime.IsZero() {
-			firstTxTime = bs.Time
-		}
-		lastTxTime = bs.Time
-	}
-	if !firstTxTime.IsZero() {
-		blockDuration := lastTxTime.Sub(firstTxTime).Seconds()
+	if !firstIncludedTime.IsZero() {
+		blockDuration := lastIncludedTime.Sub(firstIncludedTime).Seconds()
 		if blockDuration > 0 {
 			totalDuration = blockDuration
 		}
 	}
```
</details>


Also applies to: 157-173

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` around lines 131 - 137,
BlockStat currently uses TxCount: len(block.TxHashes) and block times which
causes the TPS window to include blocks that contain no benchmark transactions;
update the BlockStat population logic (where blockStats is appended) to count
only matched benchmark transactions (e.g., filter block.TxHashes by the
benchmark tx set and set TxCount to that filtered length) and also record the
timestamp of the first/last matched benchmark tx (or a flag/field indicating
presence) so that the overall window computation (the later logic around
computing start/end times and TPS between lines ~157-173) uses the timestamps of
the first and last blocks that actually contained benchmark txs rather than any
non-empty block times. Ensure references to BlockStat, TxCount, blockStats and
the window computation code are updated accordingly.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @integration-tests/e2e/benchmark/collector.go:

  • Around line 48-64: NewMempoolPoller may create a time.Ticker with a
    non-positive interval which panics; in the constructor (NewMempoolPoller) ensure
    p.interval is a positive duration before starting the goroutine—if the provided
    interval is <= 0, replace it with a sensible default (e.g., time.Second or
    another minimum > 0) so that MempoolPoller.run can safely create a
    time.NewTicker(p.interval) without panicking.
  • Around line 206-208: The post-mempool wait currently uses a blocking
    time.Sleep(3 * time.Second) which ignores ctx cancellation; replace it with a
    context-aware timer: create a time.NewTimer(3*time.Second) and use select to
    wait on either timer.C or ctx.Done(), stop the timer when appropriate, and if
    ctx.Done() fires propagate/return the context error instead of proceeding to
    call cluster.LatestHeight(ctx, 0).

In @integration-tests/e2e/cluster/cluster.go:

  • Around line 886-890: The code only sets memiavl.enable to "true" when
    c.opts.MemIAVL is true, leaving the false case to upstream defaults; update the
    logic around the conditional that calls setTOMLValue(appPath, "memiavl",
    "enable", ...) so that it always writes an explicit value: write "true" when
    c.opts.MemIAVL is true and write "false" when it's false (i.e., call
    setTOMLValue with "false" in the else branch), ensuring the memiavl.enable key
    is deterministically set regardless of upstream config.

Duplicate comments:
In @integration-tests/e2e/benchmark/collector.go:

  • Around line 131-137: BlockStat currently uses TxCount: len(block.TxHashes) and
    block times which causes the TPS window to include blocks that contain no
    benchmark transactions; update the BlockStat population logic (where blockStats
    is appended) to count only matched benchmark transactions (e.g., filter
    block.TxHashes by the benchmark tx set and set TxCount to that filtered length)
    and also record the timestamp of the first/last matched benchmark tx (or a
    flag/field indicating presence) so that the overall window computation (the
    later logic around computing start/end times and TPS between lines ~157-173)
    uses the timestamps of the first and last blocks that actually contained
    benchmark txs rather than any non-empty block times. Ensure references to
    BlockStat, TxCount, blockStats and the window computation code are updated
    accordingly.

In @integration-tests/e2e/benchmark/README.md:

  • Around line 37-38: The README documents two different default results
    directories ("results/" and "benchmark/results/"), causing ambiguity for users
    and CI; pick one canonical default path and update all mentions to it (including
    the sentence referencing TestBenchmarkFullComparison and the other occurrences
    noted around lines 87-92 and 189) so the doc consistently uses the same
    directory throughout; ensure any examples, scripts, and the
    TestBenchmarkFullComparison description reflect that single chosen path.
  • Around line 107-112: The markdown fenced code blocks triggering MD040 should
    include an explicit language identifier (e.g., use text) instead of bare ;
    update the shown table block and the other fenced blocks (the blocks around the
    snippet at lines 130-134 and 155-163) by changing their opening fences from totext (or another appropriate language) so markdownlint MD040 no longer
    warns and the content renders correctly.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Cache: Disabled due to data retention organization setting**

**Knowledge base: Disabled due to data retention organization setting**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between de2baaf1144434fe86ea58dc43a0d066cde3821c and e9fab6663dd0532eb038f3353c191049f39ae171.

</details>

<details>
<summary>📒 Files selected for processing (5)</summary>

* `integration-tests/e2e/benchmark/README.md`
* `integration-tests/e2e/benchmark/benchmark_test.go`
* `integration-tests/e2e/benchmark/collector.go`
* `integration-tests/e2e/benchmark/load.go`
* `integration-tests/e2e/cluster/cluster.go`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (2)</summary>

* integration-tests/e2e/benchmark/benchmark_test.go
* integration-tests/e2e/benchmark/load.go

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (2)

886-892: Past review comment addressed correctly.

memiavl.enable is now always written explicitly, eliminating dependence on upstream defaults for both variants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 886 - 892, The code
now always sets memiavl.enable explicitly; ensure the call to setTOMLValue
writes the string value derived from c.opts.MemIAVL by keeping memIAVLEnable and
the if block that sets it based on c.opts.MemIAVL, then pass memIAVLEnable to
setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable) so memiavl.enable is
unambiguous regardless of upstream defaults (verify in the function where
memIAVLEnable, c.opts.MemIAVL, appPath and setTOMLValue are used).

332-336: Past review comment addressed correctly.

The fail-fast decode error with height and idx context directly addresses the previous concern about silent tx under-reporting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 332 - 336, The loop
already fails fast on decode errors and includes contextual info; no change
needed—keep the current base64Decode handling inside the for loop that returns
fmt.Errorf("failed to decode tx at height=%d index=%d: %w", height, idx, decErr)
so failures surface with height/idx context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 886-892: The code now always sets memiavl.enable explicitly;
ensure the call to setTOMLValue writes the string value derived from
c.opts.MemIAVL by keeping memIAVLEnable and the if block that sets it based on
c.opts.MemIAVL, then pass memIAVLEnable to setTOMLValue(appPath, "memiavl",
"enable", memIAVLEnable) so memiavl.enable is unambiguous regardless of upstream
defaults (verify in the function where memIAVLEnable, c.opts.MemIAVL, appPath
and setTOMLValue are used).
- Around line 332-336: The loop already fails fast on decode errors and includes
contextual info; no change needed—keep the current base64Decode handling inside
the for loop that returns fmt.Errorf("failed to decode tx at height=%d index=%d:
%w", height, idx, decErr) so failures surface with height/idx context.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e9fab66 and 753bfca.

📒 Files selected for processing (2)
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/cluster/cluster.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • integration-tests/e2e/benchmark/collector.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (1)

886-890: ⚠️ Potential issue | 🟠 Major

Explicitly set memiavl.enable for both true and false paths.

At Line 886, only the true path is written. When c.opts.MemIAVL is false, config falls back to upstream defaults, which can skew baseline vs. variant comparisons.

🔧 Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
-		}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 886 - 890, The code
only sets memiavl.enable when c.opts.MemIAVL is true, causing defaults to leak
when false; update the conditional around c.opts.MemIAVL to explicitly call
setTOMLValue(appPath, "memiavl", "enable", "true") when true and call
setTOMLValue(appPath, "memiavl", "enable", "false") when false (preserve error
handling/returns), referencing c.opts.MemIAVL, setTOMLValue and appPath so the
configuration is deterministic for both branches.
integration-tests/e2e/benchmark/collector.go (1)

160-176: ⚠️ Potential issue | 🟠 Major

TPS window is still keyed off total block txs, not benchmark-matched txs.

Line 163 checks bs.TxCount, which includes non-benchmark traffic. That can shift first/last timestamps and distort TPS for this benchmark run.

🔧 Proposed fix
-	// use span from first to last block that actually included benchmark txs
-	var firstTxTime, lastTxTime time.Time
-	for _, bs := range blockStats {
-		if bs.TxCount == 0 {
-			continue
-		}
-		if firstTxTime.IsZero() {
-			firstTxTime = bs.Time
-		}
-		lastTxTime = bs.Time
-	}
+	// use span from first to last block that included matched benchmark txs
+	var firstTxTime, lastTxTime time.Time
+	for h := startHeight; h <= endHeight; h++ {
+		block, err := cluster.QueryBlock(ctx, 0, h)
+		if err != nil {
+			return BenchResult{}, fmt.Errorf("query block %d: %w", h, err)
+		}
+
+		matched := 0
+		for _, txHash := range block.TxHashes {
+			if _, ok := submitMap[txHash]; ok {
+				matched++
+			}
+		}
+		if matched == 0 {
+			continue
+		}
+		if firstTxTime.IsZero() {
+			firstTxTime = block.BlockTime
+		}
+		lastTxTime = block.BlockTime
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/collector.go` around lines 160 - 176, The TPS
window selection currently uses bs.TxCount (which includes non-benchmark
traffic) to find firstTxTime/lastTxTime; change the check to use the count of
benchmark-matched transactions for each block instead (e.g., compute
matchedCount from a field like bs.MatchedTxCount or by filtering bs.TxResults
for the benchmark marker/flag), skip blocks when matchedCount == 0, and use
those matched blocks to set firstTxTime/lastTxTime and compute
blockDuration/totalDuration (update the condition that referenced bs.TxCount and
any related logic in the loop that sets firstTxTime, lastTxTime, blockDuration
and totalDuration).
🧹 Nitpick comments (1)
integration-tests/e2e/benchmark/report.go (1)

118-137: Consider sorting loaded results for deterministic reporting output.

os.ReadDir ordering is filesystem-dependent; sorting results (or filenames) before returning will keep table ordering stable across environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 118 - 137, The code
relies on os.ReadDir(dir) ordering which is filesystem-dependent; make the
output deterministic by sorting the entries slice by entry.Name() right after
calling os.ReadDir (before the for _, entry := range entries loop) using
sort.Slice or sort.SliceStable, so the subsequent reads/unmarshals produce a
stable results order (refer to the entries variable and the loop that reads
files into results).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 88-92: The check uses only r.Config.Variant so it hides deltas for
any row sharing that Variant; instead detect the actual baseline row itself.
Replace the condition at the block that sets tpsDelta/p50Delta/p95Delta so it
compares the selected row to the baseline (e.g., r == baseline) or compares the
full config identity (e.g., reflect.DeepEqual(r.Config, baseline.Config) or
compare a unique ID field) rather than r.Config.Variant; keep the rest of the
delta assignment logic unchanged and only set "-" when the row equals the
baseline.

---

Duplicate comments:
In `@integration-tests/e2e/benchmark/collector.go`:
- Around line 160-176: The TPS window selection currently uses bs.TxCount (which
includes non-benchmark traffic) to find firstTxTime/lastTxTime; change the check
to use the count of benchmark-matched transactions for each block instead (e.g.,
compute matchedCount from a field like bs.MatchedTxCount or by filtering
bs.TxResults for the benchmark marker/flag), skip blocks when matchedCount == 0,
and use those matched blocks to set firstTxTime/lastTxTime and compute
blockDuration/totalDuration (update the condition that referenced bs.TxCount and
any related logic in the loop that sets firstTxTime, lastTxTime, blockDuration
and totalDuration).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 886-890: The code only sets memiavl.enable when c.opts.MemIAVL is
true, causing defaults to leak when false; update the conditional around
c.opts.MemIAVL to explicitly call setTOMLValue(appPath, "memiavl", "enable",
"true") when true and call setTOMLValue(appPath, "memiavl", "enable", "false")
when false (preserve error handling/returns), referencing c.opts.MemIAVL,
setTOMLValue and appPath so the configuration is deterministic for both
branches.

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 118-137: The code relies on os.ReadDir(dir) ordering which is
filesystem-dependent; make the output deterministic by sorting the entries slice
by entry.Name() right after calling os.ReadDir (before the for _, entry := range
entries loop) using sort.Slice or sort.SliceStable, so the subsequent
reads/unmarshals produce a stable results order (refer to the entries variable
and the loop that reads files into results).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 753bfca and 6bbb4a3.

📒 Files selected for processing (4)
  • integration-tests/e2e/benchmark/collector.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster/cluster.go

Comment on lines +88 to +92
if r.Config.Variant == baseline.Config.Variant {
tpsDelta = "-"
p50Delta = "-"
p95Delta = "-"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Baseline row detection should compare the selected row, not just Variant.

At Line 88, using r.Config.Variant == baseline.Config.Variant suppresses deltas for every baseline-variant row. Only the actual baseline row should show -.

🔧 Proposed fix
-	for _, r := range results {
+	for i := range results {
+		r := results[i]
 		tpsDelta := deltaStr(baseline.TxPerSecond, r.TxPerSecond)
 		p50Delta := deltaStr(baseline.P50LatencyMs, r.P50LatencyMs)
 		p95Delta := deltaStr(baseline.P95LatencyMs, r.P95LatencyMs)
 
-		if r.Config.Variant == baseline.Config.Variant {
+		if &results[i] == baseline {
 			tpsDelta = "-"
 			p50Delta = "-"
 			p95Delta = "-"
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 88 - 92, The check
uses only r.Config.Variant so it hides deltas for any row sharing that Variant;
instead detect the actual baseline row itself. Replace the condition at the
block that sets tpsDelta/p50Delta/p95Delta so it compares the selected row to
the baseline (e.g., r == baseline) or compares the full config identity (e.g.,
reflect.DeepEqual(r.Config, baseline.Config) or compare a unique ID field)
rather than r.Config.Variant; keep the rest of the delta assignment logic
unchanged and only set "-" when the row equals the baseline.

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.07%. Comparing base (82be464) to head (7f1f614).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #481   +/-   ##
=======================================
  Coverage   37.07%   37.07%           
=======================================
  Files         311      311           
  Lines       29362    29362           
=======================================
  Hits        10885    10885           
  Misses      16742    16742           
  Partials     1735     1735           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
integration-tests/e2e/cluster/cluster.go (1)

938-942: ⚠️ Potential issue | 🟠 Major

memiavl.enable is still only set when MemIAVL is true — baseline runs rely on upstream defaults.

The code only writes memiavl.enable=true when the flag is set, but never explicitly writes false. For benchmark correctness, both variants must be deterministically configured. If a node's default app.toml ships with memiavl.enable=true, baseline and mempool-only runs would silently use MemIAVL, invalidating comparisons.

Proposed fix
-		if c.opts.MemIAVL {
-			if err := setTOMLValue(appPath, "memiavl", "enable", "true"); err != nil {
-				return err
-			}
+		memIAVLEnable := "false"
+		if c.opts.MemIAVL {
+			memIAVLEnable = "true"
+		}
+		if err := setTOMLValue(appPath, "memiavl", "enable", memIAVLEnable); err != nil {
+			return err
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 938 - 942, The code
only sets memiavl.enable when c.opts.MemIAVL is true, so baseline runs may
inherit an upstream default; change the block so setTOMLValue(appPath,
"memiavl", "enable", ...) is always called with "true" if c.opts.MemIAVL else
"false" (i.e., unconditionally write the explicit setting), and propagate/return
any error from setTOMLValue; update the branch around c.opts.MemIAVL to call
setTOMLValue regardless of its value.
integration-tests/e2e/benchmark/report.go (1)

83-92: ⚠️ Potential issue | 🟡 Minor

Baseline row detection compares Variant, not identity — suppresses deltas for all rows sharing the baseline variant.

If multiple rows have VariantBaseline (e.g., loaded baseline + a re-run), all of them will show "-" instead of actual deltas. Compare the specific row, not just the variant.

Proposed fix
-	for _, r := range results {
+	for i := range results {
+		r := results[i]
 		tpsDelta := deltaStr(baseline.TxPerSecond, r.TxPerSecond)
 		p50Delta := deltaStr(baseline.P50LatencyMs, r.P50LatencyMs)
 		p95Delta := deltaStr(baseline.P95LatencyMs, r.P95LatencyMs)
 
-		if r.Config.Variant == baseline.Config.Variant {
+		if &results[i] == baseline {
 			tpsDelta = "-"
 			p50Delta = "-"
 			p95Delta = "-"
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 83 - 92, The code
currently suppresses deltas by comparing r.Config.Variant to
baseline.Config.Variant, which hides deltas for any row sharing the same
variant; change the baseline detection to compare the specific row identity
instead (e.g., compare the result object or its unique run ID) so only the exact
baseline row sets tpsDelta/p50Delta/p95Delta = "-". Update the conditional in
the loop that references results, baseline, and r.Config.Variant to use r ==
baseline or r.RunID/ResultID equality (whichever unique identifier exists) to
locate the baseline row.
🧹 Nitpick comments (5)
integration-tests/e2e/benchmark/load.go (2)

346-357: sequencePattern still produces invalid sequences for count < 3, but the caller guards against it.

The guard in OutOfOrderLoad (line 160-162) ensures count >= 3, so this is safe in practice. Adding a defensive comment or panic inside sequencePattern itself for count < 3 would prevent future misuse if someone calls it directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 346 - 357, The
sequencePattern function can return invalid slices when called with count < 3;
add a defensive check at the start of sequencePattern to validate count (e.g.,
panic or return an explicit error/empty slice) and document the requirement in a
comment; reference the function name sequencePattern and note the existing
caller OutOfOrderLoad enforces count >= 3 but sequencePattern should enforce or
clearly document this precondition to avoid future misuse.

30-88: Significant structural duplication across load functions.

BurstLoad, SequentialLoad, and SingleNodeLoad share ~80% of their body (NodeCount guard, mutex/wg setup, goroutine-per-account with ctx check, submission recording). The only meaningful differences are how viaNode is computed and whether submissions are sequential or concurrent within an account. Consider extracting a common runLoad helper parameterized by a viaNodeFn and concurrency mode to reduce maintenance burden.

Not blocking for this PR — the duplication is manageable at the current scale.

Also applies to: 94-152, 220-277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/load.go` around lines 30 - 88, BurstLoad,
SequentialLoad, and SingleNodeLoad duplicate most logic (NodeCount guard,
mutex/wg setup, goroutine-per-account, ctx check, submission recording);
refactor by extracting a shared helper runLoad that takes a viaNodeFn (func(i
int) int) and a concurrency mode (sequential vs concurrent within an account),
then have BurstLoad/SequentialLoad/SingleNodeLoad call runLoad with the
appropriate viaNodeFn and mode; ensure runLoad performs the NodeCount check,
sets up mu/wg, iterates cluster.AccountNames(), computes seq using
meta.Sequence+uint64(i), calls cluster.SendBankTxWithSequence, and appends
TxSubmission/Errors to result so callers keep the same behavior and signatures.
integration-tests/e2e/cluster/cluster.go (2)

1223-1230: Helpers are fine, but base64Decode is a trivial one-liner wrapper.

base64Decode just calls base64.StdEncoding.DecodeString. Inlining the call at the one usage site in QueryBlock (line 333) would be simpler. Minor nit — not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 1223 - 1230, The
helper base64Decode is a trivial one-liner wrapper around
base64.StdEncoding.DecodeString; remove the base64Decode function and replace
its single usage in QueryBlock with a direct call to
base64.StdEncoding.DecodeString, updating any error handling there accordingly
(ensure QueryBlock imports encoding/base64 is available and adjust variable
names if needed).

628-678: SendMoveExecuteJSONWithGas is largely duplicated from MoveExecuteJSONWithSequence.

The only difference is that gas estimation is skipped and gasLimit is passed directly. Consider extracting the shared CLI-invocation logic into a private helper to reduce the ~50 lines of duplication. Not blocking, but worth addressing to keep maintenance manageable as the CLI surface grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/cluster/cluster.go` around lines 628 - 678,
SendMoveExecuteJSONWithGas duplicates most of MoveExecuteJSONWithSequence;
factor the shared CLI invocation, JSON marshalling, c.exec call,
parseTxResultFromOutput and logging into a private helper (e.g.,
execMoveExecuteJSON or sendMoveExecuteJSONCommon) that accepts parameters for
moduleAddress,moduleName,functionName,typeArgs,args,fromName,accountNumber,sequence,gasLimit,viaNode
and any flags like "--offline" vs estimation behavior; have both
SendMoveExecuteJSONWithGas and MoveExecuteJSONWithSequence call that helper
(preserving their public signatures) and only keep the differing gas-handling
logic in each caller. Ensure the new helper uses c.exec, parseTxResultFromOutput
and the same log format as in the original functions.
integration-tests/e2e/benchmark/README.md (1)

103-111: Fenced code block missing language identifier.

Static analysis (MD040) flags the code block at line 103. Adding text as the language identifier would satisfy the linter.

Proposed fix
-```
+```text
 benchmark/
   config.go          Variant definitions, BenchConfig, preset constructors
   ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/README.md` around lines 103 - 111, The fenced
code block starting with the benchmark/ listing is missing a language identifier
and triggers MD040; update the fence that opens the block (the triple backticks
before "benchmark/") to include the language identifier "text" (i.e., change ```
to ```text) so the linter accepts it while preserving the block contents (the
section that lists config.go, load.go, collector.go, report.go,
benchmark_test.go, results/).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration-tests/e2e/benchmark/benchmark_test.go`:
- Around line 31-47: setupCluster currently starts the cluster with
cluster.Start() but does not register a cleanup, so if require.NoError(t,
cluster.WaitForReady(...)) fails the test will Goexit and leave node processes
running; fix by calling t.Cleanup(func(){ cluster.Close(context.Background()) })
(or t.Cleanup(func(){ _ = cluster.Close(ctx) }) if context is available)
immediately after successful e2e.NewCluster and before calling
cluster.Start/WaitForReady, ensuring Close is always invoked on test teardown,
and then remove manual defer cluster.Close() calls from runBenchmark,
TestBenchmarkMemIAVLMoveExec, and TestBenchmarkGossipPropagation.

In `@integration-tests/e2e/benchmark/report.go`:
- Around line 12-28: WriteResult can produce a hidden ".json" file when
result.Config.Label is empty; after computing safe :=
strings.NewReplacer(...).Replace(result.Config.Label) add a guard that if safe
== "" you assign a deterministic fallback (e.g. "result-"+timestamp or
"result-"+UnixNano) to avoid collisions, then continue to build path using that
safe value; reference symbols: function WriteResult, variable safe, and
result.Config.Label to locate and update the logic.

---

Duplicate comments:
In `@integration-tests/e2e/benchmark/report.go`:
- Around line 83-92: The code currently suppresses deltas by comparing
r.Config.Variant to baseline.Config.Variant, which hides deltas for any row
sharing the same variant; change the baseline detection to compare the specific
row identity instead (e.g., compare the result object or its unique run ID) so
only the exact baseline row sets tpsDelta/p50Delta/p95Delta = "-". Update the
conditional in the loop that references results, baseline, and r.Config.Variant
to use r == baseline or r.RunID/ResultID equality (whichever unique identifier
exists) to locate the baseline row.

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 938-942: The code only sets memiavl.enable when c.opts.MemIAVL is
true, so baseline runs may inherit an upstream default; change the block so
setTOMLValue(appPath, "memiavl", "enable", ...) is always called with "true" if
c.opts.MemIAVL else "false" (i.e., unconditionally write the explicit setting),
and propagate/return any error from setTOMLValue; update the branch around
c.opts.MemIAVL to call setTOMLValue regardless of its value.

---

Nitpick comments:
In `@integration-tests/e2e/benchmark/load.go`:
- Around line 346-357: The sequencePattern function can return invalid slices
when called with count < 3; add a defensive check at the start of
sequencePattern to validate count (e.g., panic or return an explicit error/empty
slice) and document the requirement in a comment; reference the function name
sequencePattern and note the existing caller OutOfOrderLoad enforces count >= 3
but sequencePattern should enforce or clearly document this precondition to
avoid future misuse.
- Around line 30-88: BurstLoad, SequentialLoad, and SingleNodeLoad duplicate
most logic (NodeCount guard, mutex/wg setup, goroutine-per-account, ctx check,
submission recording); refactor by extracting a shared helper runLoad that takes
a viaNodeFn (func(i int) int) and a concurrency mode (sequential vs concurrent
within an account), then have BurstLoad/SequentialLoad/SingleNodeLoad call
runLoad with the appropriate viaNodeFn and mode; ensure runLoad performs the
NodeCount check, sets up mu/wg, iterates cluster.AccountNames(), computes seq
using meta.Sequence+uint64(i), calls cluster.SendBankTxWithSequence, and appends
TxSubmission/Errors to result so callers keep the same behavior and signatures.

In `@integration-tests/e2e/benchmark/README.md`:
- Around line 103-111: The fenced code block starting with the benchmark/
listing is missing a language identifier and triggers MD040; update the fence
that opens the block (the triple backticks before "benchmark/") to include the
language identifier "text" (i.e., change ``` to ```text) so the linter accepts
it while preserving the block contents (the section that lists config.go,
load.go, collector.go, report.go, benchmark_test.go, results/).

In `@integration-tests/e2e/cluster/cluster.go`:
- Around line 1223-1230: The helper base64Decode is a trivial one-liner wrapper
around base64.StdEncoding.DecodeString; remove the base64Decode function and
replace its single usage in QueryBlock with a direct call to
base64.StdEncoding.DecodeString, updating any error handling there accordingly
(ensure QueryBlock imports encoding/base64 is available and adjust variable
names if needed).
- Around line 628-678: SendMoveExecuteJSONWithGas duplicates most of
MoveExecuteJSONWithSequence; factor the shared CLI invocation, JSON marshalling,
c.exec call, parseTxResultFromOutput and logging into a private helper (e.g.,
execMoveExecuteJSON or sendMoveExecuteJSONCommon) that accepts parameters for
moduleAddress,moduleName,functionName,typeArgs,args,fromName,accountNumber,sequence,gasLimit,viaNode
and any flags like "--offline" vs estimation behavior; have both
SendMoveExecuteJSONWithGas and MoveExecuteJSONWithSequence call that helper
(preserving their public signatures) and only keep the differing gas-handling
logic in each caller. Ensure the new helper uses c.exec, parseTxResultFromOutput
and the same log format as in the original functions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6bbb4a3 and 7f1f614.

📒 Files selected for processing (5)
  • integration-tests/e2e/benchmark/README.md
  • integration-tests/e2e/benchmark/benchmark_test.go
  • integration-tests/e2e/benchmark/load.go
  • integration-tests/e2e/benchmark/report.go
  • integration-tests/e2e/cluster/cluster.go

Comment on lines +31 to +47
func setupCluster(t *testing.T, ctx context.Context, cfg BenchConfig) *e2e.Cluster {
t.Helper()

cluster, err := e2e.NewCluster(ctx, t, e2e.ClusterOptions{
NodeCount: cfg.NodeCount,
AccountCount: cfg.AccountCount,
ChainID: "bench-e2e",
BinaryPath: os.Getenv("E2E_INITIAD_BIN"),
MemIAVL: cfg.MemIAVL,
})
require.NoError(t, err)

require.NoError(t, cluster.Start(ctx))
require.NoError(t, cluster.WaitForReady(ctx, clusterReadyTimeout))

return cluster
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Node processes leak if setupCluster fails after Start.

If WaitForReady fails (line 44), require.NoError triggers runtime.Goexit(). At that point cluster.Start() has already spawned node processes, but no Close() has been deferred — the caller's defer cluster.Close() hasn't been registered yet. This leaves orphan processes on the test host.

Proposed fix — register cleanup inside setupCluster
 func setupCluster(t *testing.T, ctx context.Context, cfg BenchConfig) *e2e.Cluster {
 	t.Helper()

 	cluster, err := e2e.NewCluster(ctx, t, e2e.ClusterOptions{
 		NodeCount:    cfg.NodeCount,
 		AccountCount: cfg.AccountCount,
 		ChainID:      "bench-e2e",
 		BinaryPath:   os.Getenv("E2E_INITIAD_BIN"),
 		MemIAVL:      cfg.MemIAVL,
 	})
 	require.NoError(t, err)
+	t.Cleanup(cluster.Close)

 	require.NoError(t, cluster.Start(ctx))
 	require.NoError(t, cluster.WaitForReady(ctx, clusterReadyTimeout))

 	return cluster
 }

Then remove the manual defer cluster.Close() calls in runBenchmark, TestBenchmarkMemIAVLMoveExec, and TestBenchmarkGossipPropagation since t.Cleanup will handle it.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func setupCluster(t *testing.T, ctx context.Context, cfg BenchConfig) *e2e.Cluster {
t.Helper()
cluster, err := e2e.NewCluster(ctx, t, e2e.ClusterOptions{
NodeCount: cfg.NodeCount,
AccountCount: cfg.AccountCount,
ChainID: "bench-e2e",
BinaryPath: os.Getenv("E2E_INITIAD_BIN"),
MemIAVL: cfg.MemIAVL,
})
require.NoError(t, err)
require.NoError(t, cluster.Start(ctx))
require.NoError(t, cluster.WaitForReady(ctx, clusterReadyTimeout))
return cluster
}
func setupCluster(t *testing.T, ctx context.Context, cfg BenchConfig) *e2e.Cluster {
t.Helper()
cluster, err := e2e.NewCluster(ctx, t, e2e.ClusterOptions{
NodeCount: cfg.NodeCount,
AccountCount: cfg.AccountCount,
ChainID: "bench-e2e",
BinaryPath: os.Getenv("E2E_INITIAD_BIN"),
MemIAVL: cfg.MemIAVL,
})
require.NoError(t, err)
t.Cleanup(cluster.Close)
require.NoError(t, cluster.Start(ctx))
require.NoError(t, cluster.WaitForReady(ctx, clusterReadyTimeout))
return cluster
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/benchmark_test.go` around lines 31 - 47,
setupCluster currently starts the cluster with cluster.Start() but does not
register a cleanup, so if require.NoError(t, cluster.WaitForReady(...)) fails
the test will Goexit and leave node processes running; fix by calling
t.Cleanup(func(){ cluster.Close(context.Background()) }) (or t.Cleanup(func(){ _
= cluster.Close(ctx) }) if context is available) immediately after successful
e2e.NewCluster and before calling cluster.Start/WaitForReady, ensuring Close is
always invoked on test teardown, and then remove manual defer cluster.Close()
calls from runBenchmark, TestBenchmarkMemIAVLMoveExec, and
TestBenchmarkGossipPropagation.

Comment on lines +12 to +28
// WriteResult writes a BenchResult as JSON to the results directory.
func WriteResult(t *testing.T, result BenchResult, dir string) error {
if err := os.MkdirAll(dir, 0o755); err != nil {
return err
}

safe := strings.NewReplacer("/", "_", " ", "_", "+", "_").Replace(result.Config.Label)
path := filepath.Join(dir, safe+".json")

data, err := json.MarshalIndent(result, "", " ")
if err != nil {
return err
}

t.Logf("Writing benchmark result to %s", path)
return os.WriteFile(path, data, 0o600)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

WriteResult could produce a file named .json if Label is empty.

If result.Config.Label is empty, safe will be empty and the output file will be .json — a hidden file on Unix that's easy to overlook and could collide across runs.

Proposed guard
 	safe := strings.NewReplacer("/", "_", " ", "_", "+", "_").Replace(result.Config.Label)
+	if safe == "" {
+		safe = "unnamed"
+	}
 	path := filepath.Join(dir, safe+".json")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration-tests/e2e/benchmark/report.go` around lines 12 - 28, WriteResult
can produce a hidden ".json" file when result.Config.Label is empty; after
computing safe := strings.NewReplacer(...).Replace(result.Config.Label) add a
guard that if safe == "" you assign a deterministic fallback (e.g.
"result-"+timestamp or "result-"+UnixNano) to avoid collisions, then continue to
build path using that safe value; reference symbols: function WriteResult,
variable safe, and result.Config.Label to locate and update the logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant