Skip to content

Bench/leios#6494

Draft
fmaste wants to merge 5 commits intomasterfrom
bench/leios
Draft

Bench/leios#6494
fmaste wants to merge 5 commits intomasterfrom
bench/leios

Conversation

@fmaste
Copy link
Copy Markdown
Contributor

@fmaste fmaste commented Mar 19, 2026

Description

A ground-up transaction generator for Cardano benchmarking, built to support Leios without risking regressions in the historical baselines produced by tx-generator over several years of release benchmarks.

Rather than retrofitting new capabilities into the existing tx-generator where any behavioural change could silently invalidate years of baseline data, tx-centrifuge implements them from scratch behind a clean, pull-based architecture. The two generators coexist: tx-generator continues to produce comparable release benchmarks while tx-centrifuge targets the higher TPS rates and workload isolation that Leios requires.

Why a new tool

The tx-generator (TpsThrottle.hs, SubmissionClient.hs, Submission.hs) was designed for earlier Cardano iterations with lower TPS targets. Eleven specific limitations motivated the new design:

  1. Cumulative scheduling instead of per-tick sleep. The tx-generator's sendNTicks sleeps 1/TPS between each tick. If the feeder falls behind (GC pause, jitter), those ticks are lost. At high TPS the per-tick sleep rounds to 0 or 1 microsecond, making rates above ~20k TPS unreliable. In the tx-centrifuge, the target time for token N is startTime + N * nanosPerToken. If the system falls behind, subsequent tokens are dispatched immediately until the schedule is caught up.

  2. One rate-limit slot per fetch, not req blocking takes. The tx-generator's consumeTxsBlocking loops req times, each doing a blocking takeTMVar, serializing all workers through one variable. Here, blockingFetch claims exactly one slot per call. The client calls it once for the mandatory first tx, then drains the rest via nonBlockingFetch.

  3. Non-blocking fills up to the batch size. The tx-generator's consumeTxsNonBlocking ignores the req parameter and returns 0 or 1 ticks regardless. Here, the client calls nonBlockingFetch in a loop up to maxBatchSize times, filling as many as the rate limit allows.

  4. Closed-loop input recycling. The tx-generator consumes transactions from a pre-built stream. Once submitted, the funds are gone; runs must pre-generate all transactions. Here, consumed inputs are recycled back to the workload's input queue after each delivery, enabling indefinite-duration runs.

  5. Independent pipelines per workload. The tx-generator shares one MVar stream across all workers. A slow node blocks the stream for all others. Here, each workload has its own input queue, payload builder, and payload queue. Workloads are fully independent.

  6. Per-target fairness. The tx-generator has all workers competing for the same TMVar. Distribution depends on which thread wins the race. Here, per-target mode gives each target its own rate limiter at a configured TPS, fair by construction. Shared mode is also available when aggregate accuracy matters more than per-target balance.

  7. Delay outside the critical section. The tx-generator calls threadDelay inside the feeder loop, the timing-critical path. Here, the delay is computed inside STM (pure integer arithmetic) and applied in the worker thread outside the critical section.

  8. Monotonic clock. The tx-generator uses getCurrentTime (wall clock, subject to NTP adjustments). Here, all timing uses MonotonicRaw, immune to NTP slew and system clock steps.

  9. Integer arithmetic for timing. The tx-generator computes delays as realToFrac (1.0 / rate) with Double conversions. At high TPS, floating-point error accumulates. Here, nanosPerToken = round (1e9 / tps) is computed once as an Integer. All subsequent scheduling uses integer multiplication. Rounding error is at most 0.5 ns per token.

  10. Testable in isolation. The tx-generator's throttle depends on TxSource era, MVar StreamState, Trace, and the full cardano-api type hierarchy. Here, the pull-fiction sub-library has zero Cardano dependencies. The test suite validates TPS accuracy and per-target fairness at up to 100k TPS across 50 simulated targets using integer tokens and IORef counters.

  11. Non-blocking STM. The tx-generator uses a single TMVar (Maybe Int) with retry when the buffer is full, parking threads inside STM. Here, TBQueue with tryReadTBQueue (never retry) ensures no thread parks inside STM. Critical sections are short; delays are applied outside the transaction.

Architecture

The project is split into two independent sub-libraries and one executable:

tx-centrifuge/
├── lib/
│   ├── pull-fiction/          # Domain-independent rate-limiting engine
│   │                          # Zero Cardano dependencies
│   │
│   └── tx-centrifuge/         # Cardano-specific layer
│                              # N2N protocols, tx building, fund loading
├── app/
│   └── Main.hs               # Wires both libraries together
│
├── test/
│   ├── pull-fiction/          # TPS accuracy & fairness tests (no node needed)
│   └── tx-centrifuge/         # Transaction building tests
│
└── bench/
    └── Bench.hs               # Criterion benchmarks

pull-fiction is the core engine. It provides rate-limited pipeline management (input queue, payload builder, bounded payload queue, closed-loop recycling), GCRA-based admission control, and workload orchestration. It is parameterised over abstract input and payload types -- it knows nothing about Cardano, transactions, or UTxOs. Dependencies: base, aeson, async, clock, containers, stm. No cardano-api, no ouroboros-*, no network.

tx-centrifuge-lib holds everything that touches cardano-api, ouroboros-network, and ouroboros-consensus: fund loading from JSON, Conway-era transaction assembly and signing, multiplexed N2N connections (TxSubmission2, ChainSync, BlockFetch, KeepAlive), transaction confirmation tracking via an observer pattern, and structured tracing via trace-dispatcher. It does not depend on pull-fiction -- the two sub-libraries are independent siblings.

app/Main.hs is the only component that depends on both libraries and on cardano-node. It loads config, creates the consensus protocol, resolves STM pipelines, spawns builders and workers, and connects to nodes.

Pipeline

inputQueue --> [payload builder] --> payloadQueue --> [worker/fetcher] --> node
    ^                                                    |
    +----------------- recycled inputs ------------------+

Each workload gets its own independent pipeline. Workers never push transactions to the node -- the node pulls via TxSubmission2 when its mempool has room. The generator's job is to keep the payload queue supplied and to pace delivery so aggregate TPS matches the configured ceiling.

Rate limiter

The rate limiter uses GCRA (Generic Cell Rate Algorithm, ITU-T I.371). It is a receiver-side rate limiter (policer), not a sender-side traffic shaper: it admits or delays pull requests against a TPS ceiling.

  • Pre-computes nanosPerToken once as an integer; all scheduling is integer multiplication.
  • Never blocks inside STM; returns a delay that the caller sleeps outside the transaction.
  • Token slots are claimed atomically, providing FIFO-fair scheduling across workers.
  • Supports shared (aggregate ceiling) and per-target (independent ceiling per node) scopes.

Fairness matters because the benchmarking cluster uses per-target metrics. Skewed distribution produces misleading results.

Recycling strategies

Three strategies for returning spent UTxO outputs to the input queue:

Strategy When outputs recycle Trade-off
on_build Immediately after tx construction Highest throughput; assumes downstream success
on_pull When a worker fetches the tx from the payload queue Safe default; inputs lost only if worker killed mid-delivery
on_confirm After observer confirms tx on-chain at configured depth Safest; handles mempool eviction; requires observer connection

Configuration

JSON-based with cascading defaults: builder, rate_limit, max_batch_size, and on_exhaustion can be set at top-level, workload, or target level. Most specific wins. Three-tier config pipeline: Raw (JSON parsing, no validation) -> Validated (pure validation, hidden constructors) -> Runtime (live STM resources). Fund files are compatible with cardano-cli conway create-testnet-data --utxo-keys output.

Testing

  • pull-fiction-test: Validates rate-limiting accuracy and per-target fairness at up to 100k TPS across 50 simulated targets. No network, no node, no blockchain. Checks: elapsed time within 5% of target, global TPS within 5%, per-target token counts within 5% (per-target mode) or 15% (shared mode).
  • tx-centrifuge-test: Validates transaction construction and signing with real cardano-api types.
  • core-bench: Criterion benchmarks for shared-limiter and per-target-limiter modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant