Conversation
… URI sanitizeMongoURI stripped every connection-pool option and nothing called SetMaxPoolSize, so the Go driver default of 100 connections per host was effectively hardcoded. On a large clone the target write pool saturated and inserts timed out; the timeouts were treated as transient and retried until the collection copy aborted and the clone froze. Allow maxPoolSize through the sanitizer so it can be tuned per source/target URI (driver default 100 when unset, 0 = unlimited), log the effective pool size per client at startup, and warn at clone start when the effective pool size is below the clone worker count. - mdb/connect.go: allow maxpoolsize in sanitizeMongoURI; log effective maxPoolSize per client; add EffectiveMaxPoolSize - main.go: warn at clone start via poolBelowWorkers / warnIfMaxPoolSizeBelowWorkers - pcsm/clone: export EffectiveNumReadWorkers / EffectiveNumInsertWorkers - mdb: sanitizer unit test + maxPoolSize integration test (testcontainers)
…fault const, context-aware pool warnings
inelpandzic
reviewed
Jun 22, 2026
Drop warnIfMaxPoolSizeBelowWorkers and its poolBelowWorkers helper from the /start path, plus the now-unused mdb.EffectiveMaxPoolSize. The per-client effective maxPoolSize is already logged at connection time, which is enough for troubleshooting; the warning only covered the clone phase while the pool can also saturate during replication. Addresses review feedback on PR #247.
inelpandzic
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PCSM-312
Problem
sanitizeMongoURIlets only a small allow-list of connection-string options through and strips the rest, including every pool option (maxPoolSize,minPoolSize,maxConnecting, and so on). Nothing callsSetMaxPoolSizeeither, so the Go driver default of 100 connections per host was effectively hardcoded with no way to tune it.A customer hit this on a 5TB clone. The target write pool saturated at 100 connections and inserts started timing out while checking out a connection. That timeout comes back as a context-deadline error, which the retry logic treats as transient, so it retried against the same full pool, gave up, aborted the collection copy, and the clone froze at
state: runningwith no progress. Anyone whose insert-worker count (clone-num-insert-workers, defaultNumCPU*2) approaches 100 can run into the same wall, and it shows up sooner on bigger pods.Solution
Allow
maxPoolSizethrough the sanitizer so it can be set per source and target connection string. Unset behaves as before (driver default 100);maxPoolSize=0means an unlimited pool. The effective pool size is logged per client at startup.Scope is
maxPoolSizeonly, matching the ticket. The other pool options stay stripped. Covered by a sanitizer unit test and anmdbintegration test (testcontainers) that connects with an explicitmaxPoolSizeand confirms it takes effect.SetMaxPoolSizecall is added. The driver already parsesmaxPoolSizefrom the URI intoopts.MaxPoolSize; the fix is purely about letting it survivesanitizeMongoURI. The newDriverDefaultMaxPoolSize = 100constant exists only to render the log line when the option is unset.maxPoolSize,MaxPoolSize, andMAXPOOLSIZEall pass through.ConnectstripsdirectConnectionand an RS member would advertise an unreachable host through the mapped port.Other changes
Makefile:make test-integrationnow also runs./mdb/...so the new testcontainers integration test executes in CI.pcsm/clone/copy.go: extractsEffectiveNumReadWorkers/EffectiveNumInsertWorkersfromapplyDefaults. Behavior is unchanged; these were introduced for the removed pool-vs-workers warning and kept as exported helpers.