Skip to content

Handle quota and cleanup correctly#103

Merged
dryajov merged 42 commits intomainfrom
feat/bump-block-size
Apr 3, 2026
Merged

Handle quota and cleanup correctly#103
dryajov merged 42 commits intomainfrom
feat/bump-block-size

Conversation

@dryajov
Copy link
Copy Markdown
Contributor

@dryajov dryajov commented Mar 17, 2026

This PR fixes many issues related to quota handling and dataset cleanup, in addition to bumping the block size to 512kb.

@dryajov dryajov force-pushed the feat/bump-block-size branch from ca30d2e to 17f6893 Compare March 18, 2026 04:52
@dryajov dryajov marked this pull request as ready for review March 18, 2026 20:58
@dryajov dryajov force-pushed the feat/bump-block-size branch 8 times, most recently from b290697 to ebd1b41 Compare March 24, 2026 04:57
@dryajov dryajov changed the title Bump block size to 512kb Handle quota and cleanup correct Mar 31, 2026
@dryajov dryajov changed the title Handle quota and cleanup correct Handle quota and cleanup correctly Mar 31, 2026
@dryajov dryajov force-pushed the feat/bump-block-size branch from 86b49fa to 8316dd0 Compare April 1, 2026 22:22
Copy link
Copy Markdown
Contributor

@markspanbroek markspanbroek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all your changes going to be this big from now on? 😅

Comment on lines +58 to +61
# Add buffer past contract end so overlay data survives through the
# last proof window. Buffer covers period (last proof window) +
# proofTimeout (challenge window). Without this, maintenance can drop
# the overlay while the proving loop is still running its last proof.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is necessary. When the request has ended, submitting the proof will not work anyway. So it doesn't matter that generating the proof fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, as per our discussion, this is mostly harmless on the marketplace side, but it will trigger re-downloads of data that is potentially already gone, so it will either download something that is not longer needed, or get stuck in retries.

Comment thread archivist/marketplace/sales/states/initialproving.nim Outdated
let periodAtFinish = periodicity.periodOf(StorageTimestamp.init(clock.now()))
if periodAtFinish != provingPeriod:
warn "Failed to generate initial proof in time",
warn "Initial proof generated after period rollover",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the word "failed" in there. If the proof is not generated in time, it will submit it in the next period, and it will therefore be rejected by the smart contract, because the challenge has changed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for proving.nim


trace "Deleting leaf and block metadata"

await self.deletingLock.drain(treeCid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as though you're trying to ensure that no-one is writing to the overlay by draining the deletingLock barrier. As far as I can tell, this is susceptible to a race condition.

In putLeafBlockMetaImpl(), you're first checking whether the overlay status equals Deleting before entering the barrier.

But here you first drain the barrier, then call several async operations, and only towards the end you set the overlay status to Deleting. In the mean time putLeafBlockMetaImpl() could have entered the barrier again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the "barrier/inverse-semaphore" is doing here, is coordinating concurrent puts and deletes, the deleting flag is just an early exit marker for subsequent callers. But you're right that it might be safer to set it early.

Comment on lines +19 to +24
import pkg/archivist/marketplace/purchasing
import pkg/archivist/marketplace/purchasing/purchase
import pkg/archivist/marketplace/sales
import pkg/archivist/marketplace/availability/store
import pkg/archivist/marketplace/storageinterface
import pkg/archivist/marketplace/node {.all.}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on a lot of internal details of the marketplace, which makes these tests rather fragile and complex. The marketplace interface should be relatively stable, and it seems that you only need for a call to marketplace.purchasing.purchase() to succeed, so I think it would make sense to introduce a simple mock for MarketplaceNode.

Copy link
Copy Markdown
Contributor Author

@dryajov dryajov Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is the only place were we can test every component in unison, introducing a mock defeats the purpose of this test that has in practice proven to be extremely useful at identifying issues early.

proc start*(builder: NodeBuilder): Future[Node] {.async.} =
var arguments: seq[string]
arguments.add("--disc-port=" & $(await builder.discoveryPortResolved))
arguments.add("--nat=none") # don't need nat for integration tests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best not make this change here, I handled this in #110:

arguments.add("--nat=" & builder.natResolved)

Copy link
Copy Markdown
Contributor Author

@dryajov dryajov Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shaves off around 10 minutes from the integration tests, otherwise it tries to initialize it on every startup.

@dryajov dryajov force-pushed the feat/bump-block-size branch from bbecc05 to 1c000f6 Compare April 2, 2026 19:25
@dryajov dryajov merged commit 40b3bd4 into main Apr 3, 2026
35 checks passed
@dryajov dryajov deleted the feat/bump-block-size branch April 3, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants