-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: Enable downloading of transaction sets in parallel with early SCP stages #5209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
21f1fa1
ee87400
717fbce
eb279bd
0a8ff6b
a06beef
3f6cd6f
06f6c2b
4173347
22548a7
72879c5
0564c22
5756571
9d96bc4
2fcc493
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -328,8 +328,19 @@ HerderImpl::processExternalized(uint64 slotIndex, StellarValue const& value, | |
| slotIndex, hexAbbrev(value.txSetHash)); | ||
| } | ||
|
|
||
| TxSetXDRFrameConstPtr externalizedSet = | ||
| mPendingEnvelopes.getTxSet(value.txSetHash); | ||
| auto result = mPendingEnvelopes.getTxSet(value.txSetHash); | ||
| TxSetXDRFrameConstPtr externalizedSet; | ||
| if (std::holds_alternative<SkipTxSet>(result)) | ||
| { | ||
| auto const& ov = value.ext.originalValue(); | ||
| externalizedSet = TxSetXDRFrame::makeEmpty(ov.previousLedgerHash, | ||
| ov.previousLedgerVersion); | ||
| } | ||
| else | ||
| { | ||
| externalizedSet = std::get<TxSetXDRFrameConstPtr>(result); | ||
| } | ||
| releaseAssert(externalizedSet != nullptr); | ||
|
|
||
| // save the SCP messages in the database | ||
| if (mApp.getConfig().MODE_STORES_HISTORY_MISC) | ||
|
|
@@ -898,6 +909,9 @@ HerderImpl::recvSCPEnvelope(SCPEnvelope const& envelope) | |
| return Herder::ENVELOPE_STATUS_SKIPPED_SELF; | ||
| } | ||
|
|
||
| // This call fetches everything. Will only return ENVELOPE_STATUS_READY once | ||
| // everything is fetched though! Will need a new status to allow it to | ||
| // proceed to nomination at least, I think. | ||
| auto status = mPendingEnvelopes.recvSCPEnvelope(envelope); | ||
| if (status == Herder::ENVELOPE_STATUS_READY) | ||
| { | ||
|
|
@@ -912,10 +926,26 @@ HerderImpl::recvSCPEnvelope(SCPEnvelope const& envelope) | |
| } | ||
| else | ||
| { | ||
| if (status == Herder::ENVELOPE_STATUS_FETCHING) | ||
| SCPStatementType type = envelope.statement.pledges.type(); | ||
| // Allow parallel tx set downloading if the node is in sync and this is | ||
| // a NOMINATE or PREPARE message. Technically both of these criteria | ||
| // should be properly handled downstream, but this provides some | ||
| // additional assurance. | ||
| if (mApp.getState() == Application::State::APP_SYNCED_STATE && | ||
| status == Herder::ENVELOPE_STATUS_FETCHING && | ||
| (type == SCP_ST_NOMINATE || type == SCP_ST_PREPARE)) | ||
| { | ||
| std::string txt("FETCHING"); | ||
| ZoneText(txt.c_str(), txt.size()); | ||
|
|
||
| // If we have the quorum set, then proceed without the tx set. | ||
| auto qSetHash = Slot::getCompanionQuorumSetHashFromStatement( | ||
| envelope.statement); | ||
| auto maybeQSet = mApp.getHerder().getQSet(qSetHash); | ||
| if (maybeQSet) | ||
| { | ||
| processSCPQueue(true); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this correct? wouldn't it be possible to have other message types in this queue, which would cause them to get processed as well?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's fine? Any CONFIRM or EXTERNALIZE messages in the queue would have fully downloaded tx sets and should be safe to process. The "partially ready" queue only contains nomination and PREPARE messages, which should also be safe to process. I don't see how this is different than the Do you see a scenario where it would be wrong to process other message types here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My worry is processing messages in unexpected order, such that it trips downstream invariance (e.g. value must be downloaded, but we accidentally fed the message that's not fully ready, crashing the node).
Do we understand why? Messages can be received in any order, so let's say we feed a ballot message with 1 value followed by a nomination message referencing 3 values that aren't all downloaded. I don't quite understand how this wouldn't trip SCP.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, is this due to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If the value must be downloaded (such as for CONFIRM and EXTERNALIZE), then the envelope isn't considered ready or partially ready and isn't in either queue, so
I don't understand why this would be a problem? Nomination can run concurrently with balloting. In this case, we'd treat the undownloaded values as valid in nomination. If balloting times out and hasn't set
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following up about this (and the other comments on HerderImpl) to capture some discussion from our meeting. The plan is to try pushing much of this logic (and related logic in PendingEnvelopes) down to SCP. My first thought is that it would live in Slot. PendingEnvelopes would treat all messages with downloaded qsets as "fully ready" (there would be no "partially ready" queue anymore) and pass them on to Slot. Slot would then interface with item fetcher as needed, and manage which values get passed to nomination/ballot protocol, and which get held back (CONFIRM and EXTERNALIZE messages without downloaded qsets). Slot would also handle delivering / replaying messages as tx sets arrive, potentially via a callback registered with ItemFetcher. The first step here is to draw out the various data flows between Herder, PendingEnvelopes, ItemFetcher, and SCP. From there we can decide how to change these interfaces. |
||
| } | ||
| } | ||
| else if (status == Herder::ENVELOPE_STATUS_PROCESSED) | ||
| { | ||
|
|
@@ -1138,6 +1168,7 @@ void | |
| HerderImpl::processSCPQueueUpToIndex(uint64 slotIndex) | ||
| { | ||
| ZoneScoped; | ||
| CLOG_TRACE(Proto, "Processing SCP queue up to index {}", slotIndex); | ||
| while (true) | ||
| { | ||
| SCPEnvelopeWrapperPtr envW = mPendingEnvelopes.pop(slotIndex); | ||
|
|
@@ -1386,7 +1417,7 @@ HerderImpl::peerDoesntHave(MessageType type, uint256 const& itemID, | |
| mPendingEnvelopes.peerDoesntHave(type, itemID, peer); | ||
| } | ||
|
|
||
| TxSetXDRFrameConstPtr | ||
| TxSetResult | ||
| HerderImpl::getTxSet(Hash const& hash) | ||
| { | ||
| return mPendingEnvelopes.getTxSet(hash); | ||
|
|
@@ -2162,11 +2193,15 @@ HerderImpl::persistSCPState(uint64 slot) | |
| // saves transaction sets referred by the statement | ||
| for (auto const& h : getValidatedTxSetHashes(e)) | ||
| { | ||
| auto txSet = mPendingEnvelopes.getTxSet(h); | ||
| if (txSet && !mApp.getPersistentState().hasTxSet(h)) | ||
| auto result = mPendingEnvelopes.getTxSet(h); | ||
| if (auto* txSetPtr = std::get_if<TxSetXDRFrameConstPtr>(&result)) | ||
| { | ||
| txSets.insert(std::make_pair(h, txSet)); | ||
| if (*txSetPtr && !mApp.getPersistentState().hasTxSet(h)) | ||
| { | ||
| txSets.insert(std::make_pair(h, *txSetPtr)); | ||
| } | ||
| } | ||
| // SkipTxSet: nothing to persist | ||
| } | ||
| Hash qsHash = Slot::getCompanionQuorumSetHashFromStatement(e.statement); | ||
| SCPQuorumSetPtr qSet = mPendingEnvelopes.getQSet(qsHash); | ||
|
|
@@ -2705,11 +2740,32 @@ bool | |
| HerderImpl::verifyStellarValueSignature(StellarValue const& sv) | ||
| { | ||
| ZoneScoped; | ||
| auto [b, _] = PubKeyUtils::verifySig( | ||
| sv.ext.lcValueSignature().nodeID, sv.ext.lcValueSignature().signature, | ||
| xdr::xdr_to_opaque(mApp.getNetworkID(), ENVELOPE_TYPE_SCPVALUE, | ||
| sv.txSetHash, sv.closeTime)); | ||
| return b; | ||
| switch (sv.ext.v()) | ||
| { | ||
| case STELLAR_VALUE_BASIC: | ||
| // This function should never be called with an unsigned value | ||
| releaseAssert(false); | ||
| case STELLAR_VALUE_SIGNED: | ||
| return PubKeyUtils::verifySig(sv.ext.lcValueSignature().nodeID, | ||
| sv.ext.lcValueSignature().signature, | ||
| xdr::xdr_to_opaque(mApp.getNetworkID(), | ||
| ENVELOPE_TYPE_SCPVALUE, | ||
| sv.txSetHash, | ||
| sv.closeTime)) | ||
| .valid; | ||
| case STELLAR_VALUE_SKIP: | ||
| { | ||
| auto const& ov = sv.ext.originalValue(); | ||
| return PubKeyUtils::verifySig( | ||
| ov.lcValueSignature.nodeID, ov.lcValueSignature.signature, | ||
| xdr::xdr_to_opaque(mApp.getNetworkID(), | ||
| ENVELOPE_TYPE_SCPVALUE, ov.txSetHash, | ||
| sv.closeTime)) | ||
| .valid; | ||
| } | ||
| default: | ||
| releaseAssert(false); | ||
| } | ||
| } | ||
|
|
||
| StellarValue | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, herder having to deal with the guts of SCP and differentiate between nomination, prepare and commit seems like an improper abstraction/separation of responsibilities situation. It also seems quite brittle - what if we change SCP in the future such that some CONFIRM messages would be allowed and some won't? or some PREPARE messages should be stalled?
It looks to me like the main problem is the weird mechanism in herder (which existed prior to your changes), which decides when to feed new messages to SCP. Does it really need to live in herder? Have you considered moving the mechanism inside of SCP, so it can automatically decide when to advance the protocol vs when to stall (then re-feed the messages back to the protocol).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you mean. I'll try moving this into SCP after wrapping up the test harness work.