Skip to content

cardano-testnet | Extract EpochStateView into its own module#6547

Open
Jimbo4350 wants to merge 3 commits intomasterfrom
jmillar/refactor/extract-epoch-state-view-module
Open

cardano-testnet | Extract EpochStateView into its own module#6547
Jimbo4350 wants to merge 3 commits intomasterfrom
jmillar/refactor/extract-epoch-state-view-module

Conversation

@Jimbo4350
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #6538. Moves EpochStateView, its background fold setup, the STM-based wait primitive (awaitStateUpdateTimeout), and the retry loops (retryUntilRightM / retryUntilJustM / retryUntilM / waitForEpochs / waitForBlocks) out of Testnet.Components.Query and into a new module Testnet.Components.EpochStateView.

The new module carries a top-down haddock overview explaining how the writer thread, the monotonic version counter, and the wait primitive cooperate — so future readers can learn the mechanism from one place instead of piecing it together across Query.hs.

Testnet.Components.Query re-exports the public API, so existing callers need no import changes.

Motivation

Testnet.Components.Query had two intertwined concerns:

  1. The EpochStateView synchronisation primitive — background thread, shared TVars, version counter, STM wait/retry logic.
  2. Test-facing query helpers built on top (findAllUtxos, getGovState, checkDRepState, assertNewEpochState, etc.).

The first concern is the subtle, dangerous part. Isolating it into its own module means the synchronisation contract (sampled version must match fresh state in the same STM transaction) lives in one place, with its own module-level documentation — and nothing outside the module can reach into the raw TVars by mistake.

What stayed in Query.hs

All the state-consumer helpers (findAllUtxos, UTXO helpers, getGovState, getTreasuryValue, checkDRepsNumber, checkDRepState, assertNewEpochState, protocol-parameter getters, getTxIx, waitUntilEpoch).

Test plan

  • cabal build cardano-testnet:lib:cardano-testnet builds clean
  • Full integration test run (executable linking needs nix-shell locally; CI will cover)
  • Reviewer eyeballs the module header haddock for accuracy

carbolymer and others added 3 commits April 21, 2026 09:47
Move EpochStateView, its background-thread setup, the STM-based wait
primitive, and the retry loops out of Testnet.Components.Query and into
a new module Testnet.Components.EpochStateView.

The new module carries a top-down haddock that explains how the writer
thread, the version counter, and awaitStateUpdateTimeout cooperate so
that future readers can learn the mechanism from one place rather than
piecing it together across Query.hs.

Testnet.Components.Query re-exports the public API so existing callers
do not need to change their imports.
@Jimbo4350 Jimbo4350 marked this pull request as ready for review April 23, 2026 16:28
@Jimbo4350 Jimbo4350 requested a review from a team as a code owner April 23, 2026 16:28
@Jimbo4350 Jimbo4350 requested review from coot and removed request for a team April 23, 2026 16:28
@carbolymer carbolymer force-pushed the mgalazyn/refactor/remove-epoch-state-polling branch 3 times, most recently from 5cbd71a to a205cac Compare April 24, 2026 07:52
Base automatically changed from mgalazyn/refactor/remove-epoch-state-polling to master April 24, 2026 09:33
Comment on lines +38 to +53
1. Samples the current version /before/ running the action, so that updates
landing during the action are not missed.
2. Runs the action. On 'Right' it returns immediately.
3. On 'Left' it checks whether the chain-unit deadline has been exceeded; if so
it returns the last 'Left'.
4. Otherwise it blocks on STM until the version advances past the sampled
value, then loops.

The STM block is performed by an internal helper that combines a fast path
(return immediately if the version already differs — common when the action
took long enough that a block landed during it) with an awaited path (register
a fallback timer and block on STM until either the version advances or the
timer fires). The fallback is a stall-detection heartbeat, not a wait duration
— its only job is to guarantee no single STM transaction blocks indefinitely.
What actually terminates the loop is either a successful action or the
chain-unit deadline.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long - no one will read that. This basically turns the implementation into the prose and risks getting out of sync with the implementation. I suggest removing this.

Comment on lines +232 to +249
-- | Status of the 'EpochStateView' background thread when epoch state is not yet available
data EpochStateStatus
= EpochStateNotInitialised
-- ^ The background thread has not yet received any epoch state from the node
| EpochStateFoldError !FoldBlocksError
-- ^ The background thread encountered an error while folding blocks

-- | A read-only handle to an epoch state that is kept fresh by a background thread.
--
-- The constructor is private. Reads go through the accessor functions exported from
-- this module ('getEpochState', 'getBlockNumber', 'getSlotNumber',
-- 'getEpochStateDetails', 'getCurrentEpochNo') so that callers cannot accidentally
-- race against the version-counter synchronisation contract described in the module
-- header.
data EpochStateView = EpochStateView
{ epochStateView :: !(TVar (Either EpochStateStatus (AnyNewEpochState, SlotNo, BlockNo)))
, epochStateVersion :: !(TVar Word64)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two datatypes should go to the top of the module.

Comment on lines +16 to +30
'EpochStateView' holds two 'TVar's (private to this module):

* The current @(AnyNewEpochState, SlotNo, BlockNo)@, or a status value while the
view is initialising or after the background thread has errored.
* A monotonically-increasing @Word64@ version counter, bumped on every write to
the state 'TVar'. This counter is the synchronisation channel between the
writer thread and any waiter: a caller records the current version, performs
its check, and then blocks on STM until the version differs.

The writer is set up by 'getEpochStateView'. It launches a long-lived fold (via
'asyncRegister_') that runs the chain-sync client against the node. For every
block the node streams, it writes the derived @NewEpochState@ into the state
'TVar' and bumps the version. If the fold terminates with an error, it writes
the error status and bumps the version so any waiting threads can observe the
failure.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation details like TVar or asyncRegister_ are irrelevant here. They just add to the volume without providing useful info. Can you rewrite this focusing on behaviour instead?

void . retryUntilRightM epochStateView (WaitForBlocks numberOfBlocks) . pure $ Left ()
getBlockNumber epochStateView

-- | Deadline for 'retryUntilRightM' and its wrappers, expressed in chain units
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 'retryUntilRightM' and its wrappers,

the purpose is irrelevant here. What this type represents is what matters

Comment on lines +137 to +139
-- rather than wall-clock time. Each iteration of the loop only advances when the
-- chain advances, so the deadline measures how much chain progress the caller is
-- willing to wait for.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each iteration of the loop only advances when the
-- chain advances, so the deadline measures how much chain progress the caller is
-- willing to wait for.

redundant

Comment on lines +251 to +252
-- | Block until the epoch state version advances past the provided previously sampled
-- version, or until the fallback timeout expires. Returns immediately if the current
Copy link
Copy Markdown
Contributor

@carbolymer carbolymer Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version concept is leaking into the doc. Mention epoch state update instead.

Copy link
Copy Markdown
Contributor

@carbolymer carbolymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Comments need more work. At the moment they're too lengthy, leak implementation details and mix them with the behaviour. Can you simplify them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants