Fix db.client.connection.count metric drifting over time#4078
Open
verdie-g wants to merge 3 commits into
Open
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
Fix three bugs in the UpDownCounter-based connection count metric that cause IDLE and USED counters to drift in long-running applications (observed as +5M idle / -5M used in production). Bug 1 - Phantom IDLE transitions: ConnectionPool.make_connection() and BlockingConnectionPool.make_connection() recorded IDLE +1 for new connections, then get_connection() unconditionally recorded IDLE -1. While balanced in isolation, this creates unnecessary IDLE transitions that drift if any step is disrupted. Fix: remove IDLE +1 from make_connection() and add an is_created check in get_connection() so new connections only record USED +1 (matching the async pool pattern). Bug 2 - Wrong pool name: ConnectionPool.release() and BlockingConnectionPool.release() recorded USED -1 against the literal string 'unknown_pool' when owns_connection() returned False (e.g. after a fork). The real pool's USED counter was never decremented. Fix: use get_pool_name(self) instead. Bug 3 - Full queue silent drop: BlockingConnectionPool.release() swallowed both the USED -1 and IDLE +1 recordings when put_nowait() raised Full, leaking USED +1 permanently and never disconnecting the dropped connection. Fix: in the except Full handler, disconnect the connection and record USED -1. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
c0c59cc to
f67ec4b
Compare
Collaborator
|
Hi @verdie-g, thank you for your contribution! Please ping me when the PR is ready for review :) |
Author
|
Hi @petyaslavova, it should be ready to review :) |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 3738fea. Configure here.
Address PR review: when BlockingConnectionPool.release() hits a full queue it recorded USED -1 and disconnected the connection, but left it in self._connections. A later reset() or __del__ computed in_use_count from len(self._connections) and decremented USED again, drifting the db.client.connection.count metric negative. Remove the connection from self._connections in the except Full handler so it is no longer counted as in-use. Add a regression test asserting a subsequent reset() does not double-decrement USED. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
The
db.client.connection.countUpDownCounter metric drifts in long-running applications, producing nonsensical values like +5M idle / -5M used connections.Root Cause
Three bugs in the sync connection pool metric recording:
Bug 1 — Phantom IDLE transitions
ConnectionPool.make_connection()andBlockingConnectionPool.make_connection()recordedIDLE +1for new connections, thenget_connection()unconditionally recordedIDLE -1. While balanced in isolation, this creates unnecessary IDLE transitions that drift if any step is disrupted (e.g. OTel collector not yet initialized). The async pool already avoids this with anis_createdcheck.Bug 2 — Wrong pool name on unowned connections
release()recordedUSED -1against the literal string"unknown_pool"whenowns_connection()returnedFalse(e.g. after a fork). The real pool's USED counter was never decremented.Bug 3 — Full queue silent drop
BlockingConnectionPool.release()swallowed both theUSED -1andIDLE +1recordings whenput_nowait()raisedFull. The USED counter leaked +1 permanently and the dropped connection was never disconnected.Fix
IDLE +1frommake_connection(), addis_createdcheck inget_connection()— new connections record onlyUSED +1, reused connections recordIDLE -1, USED +1. Matches the async pool.get_pool_name(self)instead of"unknown_pool"except Full, disconnect the connection and recordUSED -1Tests
9 new tests in
tests/test_observability/test_connection_count_bugs.pycovering all three bugs across bothConnectionPoolandBlockingConnectionPool.Note
Medium Risk
Changes only observability counter recording in hot pool paths (
get_connection/release), but incorrect metrics previously misled production monitoring; behavior of Redis I/O is unchanged.Overview
Fixes drift in the
db.client.connection.countobservability UpDownCounter for syncConnectionPoolandBlockingConnectionPool.Metric accounting: Stops recording
IDLE +1whenmake_connection()creates a socket. Onget_connection(), new connections now emit onlyUSED +1; reused pool connections still recordIDLE -1andUSED +1(aligned with the async pools).Release edge cases: When the pool does not own a connection (e.g. after fork),
USED -1is attributed withget_pool_name(self)instead of the hard-coded"unknown_pool". ForBlockingConnectionPool, ifput_nowait()raisesFull, the connection is disconnected, removed from_connections, andUSED -1is recorded so counters do not leak andreset()does not double-decrement.Tests: Adds mocked pool lifecycle tests in
tests/test_connection.pyfor new vs reused connections, full lifecycle netting to zero, unowned release pool name, full-queue release, and reset after a dropped connection.Reviewed by Cursor Bugbot for commit dccf545. Bugbot is set up for automated code reviews on this repo. Configure here.