cardano-testnet | Extract EpochStateView into its own module#6547
cardano-testnet | Extract EpochStateView into its own module#6547
Conversation
Move EpochStateView, its background-thread setup, the STM-based wait primitive, and the retry loops out of Testnet.Components.Query and into a new module Testnet.Components.EpochStateView. The new module carries a top-down haddock that explains how the writer thread, the version counter, and awaitStateUpdateTimeout cooperate so that future readers can learn the mechanism from one place rather than piecing it together across Query.hs. Testnet.Components.Query re-exports the public API so existing callers do not need to change their imports.
5cbd71a to
a205cac
Compare
| 1. Samples the current version /before/ running the action, so that updates | ||
| landing during the action are not missed. | ||
| 2. Runs the action. On 'Right' it returns immediately. | ||
| 3. On 'Left' it checks whether the chain-unit deadline has been exceeded; if so | ||
| it returns the last 'Left'. | ||
| 4. Otherwise it blocks on STM until the version advances past the sampled | ||
| value, then loops. | ||
|
|
||
| The STM block is performed by an internal helper that combines a fast path | ||
| (return immediately if the version already differs — common when the action | ||
| took long enough that a block landed during it) with an awaited path (register | ||
| a fallback timer and block on STM until either the version advances or the | ||
| timer fires). The fallback is a stall-detection heartbeat, not a wait duration | ||
| — its only job is to guarantee no single STM transaction blocks indefinitely. | ||
| What actually terminates the loop is either a successful action or the | ||
| chain-unit deadline. |
There was a problem hiding this comment.
This is too long - no one will read that. This basically turns the implementation into the prose and risks getting out of sync with the implementation. I suggest removing this.
| -- | Status of the 'EpochStateView' background thread when epoch state is not yet available | ||
| data EpochStateStatus | ||
| = EpochStateNotInitialised | ||
| -- ^ The background thread has not yet received any epoch state from the node | ||
| | EpochStateFoldError !FoldBlocksError | ||
| -- ^ The background thread encountered an error while folding blocks | ||
|
|
||
| -- | A read-only handle to an epoch state that is kept fresh by a background thread. | ||
| -- | ||
| -- The constructor is private. Reads go through the accessor functions exported from | ||
| -- this module ('getEpochState', 'getBlockNumber', 'getSlotNumber', | ||
| -- 'getEpochStateDetails', 'getCurrentEpochNo') so that callers cannot accidentally | ||
| -- race against the version-counter synchronisation contract described in the module | ||
| -- header. | ||
| data EpochStateView = EpochStateView | ||
| { epochStateView :: !(TVar (Either EpochStateStatus (AnyNewEpochState, SlotNo, BlockNo))) | ||
| , epochStateVersion :: !(TVar Word64) | ||
| } |
There was a problem hiding this comment.
Those two datatypes should go to the top of the module.
| 'EpochStateView' holds two 'TVar's (private to this module): | ||
|
|
||
| * The current @(AnyNewEpochState, SlotNo, BlockNo)@, or a status value while the | ||
| view is initialising or after the background thread has errored. | ||
| * A monotonically-increasing @Word64@ version counter, bumped on every write to | ||
| the state 'TVar'. This counter is the synchronisation channel between the | ||
| writer thread and any waiter: a caller records the current version, performs | ||
| its check, and then blocks on STM until the version differs. | ||
|
|
||
| The writer is set up by 'getEpochStateView'. It launches a long-lived fold (via | ||
| 'asyncRegister_') that runs the chain-sync client against the node. For every | ||
| block the node streams, it writes the derived @NewEpochState@ into the state | ||
| 'TVar' and bumps the version. If the fold terminates with an error, it writes | ||
| the error status and bumps the version so any waiting threads can observe the | ||
| failure. |
There was a problem hiding this comment.
The implementation details like TVar or asyncRegister_ are irrelevant here. They just add to the volume without providing useful info. Can you rewrite this focusing on behaviour instead?
| void . retryUntilRightM epochStateView (WaitForBlocks numberOfBlocks) . pure $ Left () | ||
| getBlockNumber epochStateView | ||
|
|
||
| -- | Deadline for 'retryUntilRightM' and its wrappers, expressed in chain units |
There was a problem hiding this comment.
for 'retryUntilRightM' and its wrappers,
the purpose is irrelevant here. What this type represents is what matters
| -- rather than wall-clock time. Each iteration of the loop only advances when the | ||
| -- chain advances, so the deadline measures how much chain progress the caller is | ||
| -- willing to wait for. |
There was a problem hiding this comment.
Each iteration of the loop only advances when the
-- chain advances, so the deadline measures how much chain progress the caller is
-- willing to wait for.
redundant
| -- | Block until the epoch state version advances past the provided previously sampled | ||
| -- version, or until the fallback timeout expires. Returns immediately if the current |
There was a problem hiding this comment.
Version concept is leaking into the doc. Mention epoch state update instead.
carbolymer
left a comment
There was a problem hiding this comment.
Good work. Comments need more work. At the moment they're too lengthy, leak implementation details and mix them with the behaviour. Can you simplify them?
Summary
Follow-up to #6538. Moves
EpochStateView, its background fold setup, the STM-based wait primitive (awaitStateUpdateTimeout), and the retry loops (retryUntilRightM/retryUntilJustM/retryUntilM/waitForEpochs/waitForBlocks) out ofTestnet.Components.Queryand into a new moduleTestnet.Components.EpochStateView.The new module carries a top-down haddock overview explaining how the writer thread, the monotonic version counter, and the wait primitive cooperate — so future readers can learn the mechanism from one place instead of piecing it together across
Query.hs.Testnet.Components.Queryre-exports the public API, so existing callers need no import changes.Motivation
Testnet.Components.Queryhad two intertwined concerns:EpochStateViewsynchronisation primitive — background thread, sharedTVars, version counter, STM wait/retry logic.findAllUtxos,getGovState,checkDRepState,assertNewEpochState, etc.).The first concern is the subtle, dangerous part. Isolating it into its own module means the synchronisation contract (sampled version must match fresh state in the same STM transaction) lives in one place, with its own module-level documentation — and nothing outside the module can reach into the raw
TVars by mistake.What stayed in
Query.hsAll the state-consumer helpers (
findAllUtxos, UTXO helpers,getGovState,getTreasuryValue,checkDRepsNumber,checkDRepState,assertNewEpochState, protocol-parameter getters,getTxIx,waitUntilEpoch).Test plan
cabal build cardano-testnet:lib:cardano-testnetbuilds clean