Skip to content

PCSM-312: Make maxPoolSize configurable via connection-string URI#247

Merged
chupe merged 3 commits into
mainfrom
PCSM-312
Jun 22, 2026
Merged

PCSM-312: Make maxPoolSize configurable via connection-string URI#247
chupe merged 3 commits into
mainfrom
PCSM-312

Conversation

@chupe

@chupe chupe commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

PCSM-312

Problem

sanitizeMongoURI lets only a small allow-list of connection-string options through and strips the rest, including every pool option (maxPoolSize, minPoolSize, maxConnecting, and so on). Nothing calls SetMaxPoolSize either, so the Go driver default of 100 connections per host was effectively hardcoded with no way to tune it.

A customer hit this on a 5TB clone. The target write pool saturated at 100 connections and inserts started timing out while checking out a connection. That timeout comes back as a context-deadline error, which the retry logic treats as transient, so it retried against the same full pool, gave up, aborted the collection copy, and the clone froze at state: running with no progress. Anyone whose insert-worker count (clone-num-insert-workers, default NumCPU*2) approaches 100 can run into the same wall, and it shows up sooner on bigger pods.

Solution

Allow maxPoolSize through the sanitizer so it can be set per source and target connection string. Unset behaves as before (driver default 100); maxPoolSize=0 means an unlimited pool. The effective pool size is logged per client at startup.

Scope is maxPoolSize only, matching the ticket. The other pool options stay stripped. Covered by a sanitizer unit test and an mdb integration test (testcontainers) that connects with an explicit maxPoolSize and confirms it takes effect.

  • No explicit SetMaxPoolSize call is added. The driver already parses maxPoolSize from the URI into opts.MaxPoolSize; the fix is purely about letting it survive sanitizeMongoURI. The new DriverDefaultMaxPoolSize = 100 constant exists only to render the log line when the option is unset.
  • Sanitizer matching is case-insensitive, so maxPoolSize, MaxPoolSize, and MAXPOOLSIZE all pass through.
  • The integration test runs against a standalone mongod (not a single-node RS) on purpose, because Connect strips directConnection and an RS member would advertise an unreachable host through the mapped port.

Other changes

  • Makefile: make test-integration now also runs ./mdb/... so the new testcontainers integration test executes in CI.
  • pcsm/clone/copy.go: extracts EffectiveNumReadWorkers / EffectiveNumInsertWorkers from applyDefaults. Behavior is unchanged; these were introduced for the removed pool-vs-workers warning and kept as exported helpers.

… URI

sanitizeMongoURI stripped every connection-pool option and nothing called
SetMaxPoolSize, so the Go driver default of 100 connections per host was
effectively hardcoded. On a large clone the target write pool saturated and
inserts timed out; the timeouts were treated as transient and retried until
the collection copy aborted and the clone froze.

Allow maxPoolSize through the sanitizer so it can be tuned per source/target
URI (driver default 100 when unset, 0 = unlimited), log the effective pool
size per client at startup, and warn at clone start when the effective pool
size is below the clone worker count.

- mdb/connect.go: allow maxpoolsize in sanitizeMongoURI; log effective
  maxPoolSize per client; add EffectiveMaxPoolSize
- main.go: warn at clone start via poolBelowWorkers / warnIfMaxPoolSizeBelowWorkers
- pcsm/clone: export EffectiveNumReadWorkers / EffectiveNumInsertWorkers
- mdb: sanitizer unit test + maxPoolSize integration test (testcontainers)
@chupe chupe requested a review from inelpandzic as a code owner June 19, 2026 13:02
github-actions[bot]

This comment was marked as outdated.

Comment thread main.go Outdated
Drop warnIfMaxPoolSizeBelowWorkers and its poolBelowWorkers helper from the /start path, plus the now-unused mdb.EffectiveMaxPoolSize. The per-client effective maxPoolSize is already logged at connection time, which is enough for troubleshooting; the warning only covered the clone phase while the pool can also saturate during replication.

Addresses review feedback on PR #247.
@chupe chupe merged commit 3eb82dd into main Jun 22, 2026
71 of 74 checks passed
@chupe chupe deleted the PCSM-312 branch June 22, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants