Skip to content

Fix db.client.connection.count metric drifting over time#4078

Open
verdie-g wants to merge 3 commits into
redis:masterfrom
verdie-g:fix/connection-count-metric-bugs
Open

Fix db.client.connection.count metric drifting over time#4078
verdie-g wants to merge 3 commits into
redis:masterfrom
verdie-g:fix/connection-count-metric-bugs

Conversation

@verdie-g

@verdie-g verdie-g commented May 22, 2026

Copy link
Copy Markdown

Problem

The db.client.connection.count UpDownCounter metric drifts in long-running applications, producing nonsensical values like +5M idle / -5M used connections.

Root Cause

Three bugs in the sync connection pool metric recording:

Bug 1 — Phantom IDLE transitions

ConnectionPool.make_connection() and BlockingConnectionPool.make_connection() recorded IDLE +1 for new connections, then get_connection() unconditionally recorded IDLE -1. While balanced in isolation, this creates unnecessary IDLE transitions that drift if any step is disrupted (e.g. OTel collector not yet initialized). The async pool already avoids this with an is_created check.

Bug 2 — Wrong pool name on unowned connections

release() recorded USED -1 against the literal string "unknown_pool" when owns_connection() returned False (e.g. after a fork). The real pool's USED counter was never decremented.

Bug 3 — Full queue silent drop

BlockingConnectionPool.release() swallowed both the USED -1 and IDLE +1 recordings when put_nowait() raised Full. The USED counter leaked +1 permanently and the dropped connection was never disconnected.

Fix

Bug Fix
Phantom IDLE Remove IDLE +1 from make_connection(), add is_created check in get_connection() — new connections record only USED +1, reused connections record IDLE -1, USED +1. Matches the async pool.
Wrong pool name Use get_pool_name(self) instead of "unknown_pool"
Full queue drop In except Full, disconnect the connection and record USED -1

Tests

9 new tests in tests/test_observability/test_connection_count_bugs.py covering all three bugs across both ConnectionPool and BlockingConnectionPool.


Note

Medium Risk
Changes only observability counter recording in hot pool paths (get_connection/release), but incorrect metrics previously misled production monitoring; behavior of Redis I/O is unchanged.

Overview
Fixes drift in the db.client.connection.count observability UpDownCounter for sync ConnectionPool and BlockingConnectionPool.

Metric accounting: Stops recording IDLE +1 when make_connection() creates a socket. On get_connection(), new connections now emit only USED +1; reused pool connections still record IDLE -1 and USED +1 (aligned with the async pools).

Release edge cases: When the pool does not own a connection (e.g. after fork), USED -1 is attributed with get_pool_name(self) instead of the hard-coded "unknown_pool". For BlockingConnectionPool, if put_nowait() raises Full, the connection is disconnected, removed from _connections, and USED -1 is recorded so counters do not leak and reset() does not double-decrement.

Tests: Adds mocked pool lifecycle tests in tests/test_connection.py for new vs reused connections, full lifecycle netting to zero, unowned release pool name, full-queue release, and reset after a dropped connection.

Reviewed by Cursor Bugbot for commit dccf545. Bugbot is set up for automated code reviews on this repo. Configure here.

@jit-ci

jit-ci Bot commented May 22, 2026

Copy link
Copy Markdown

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Fix three bugs in the UpDownCounter-based connection count metric that
cause IDLE and USED counters to drift in long-running applications
(observed as +5M idle / -5M used in production).

Bug 1 - Phantom IDLE transitions: ConnectionPool.make_connection() and
BlockingConnectionPool.make_connection() recorded IDLE +1 for new
connections, then get_connection() unconditionally recorded IDLE -1.
While balanced in isolation, this creates unnecessary IDLE transitions
that drift if any step is disrupted. Fix: remove IDLE +1 from
make_connection() and add an is_created check in get_connection() so
new connections only record USED +1 (matching the async pool pattern).

Bug 2 - Wrong pool name: ConnectionPool.release() and
BlockingConnectionPool.release() recorded USED -1 against the literal
string 'unknown_pool' when owns_connection() returned False (e.g. after
a fork). The real pool's USED counter was never decremented. Fix: use
get_pool_name(self) instead.

Bug 3 - Full queue silent drop: BlockingConnectionPool.release()
swallowed both the USED -1 and IDLE +1 recordings when put_nowait()
raised Full, leaking USED +1 permanently and never disconnecting the
dropped connection. Fix: in the except Full handler, disconnect the
connection and record USED -1.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
@verdie-g verdie-g force-pushed the fix/connection-count-metric-bugs branch from c0c59cc to f67ec4b Compare May 22, 2026 14:27
@petyaslavova

Copy link
Copy Markdown
Collaborator

Hi @verdie-g, thank you for your contribution! Please ping me when the PR is ready for review :)

@verdie-g verdie-g marked this pull request as ready for review May 26, 2026 07:35
@verdie-g

Copy link
Copy Markdown
Author

Hi @petyaslavova, it should be ready to review :)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 3738fea. Configure here.

Comment thread redis/connection.py
Address PR review: when BlockingConnectionPool.release() hits a full
queue it recorded USED -1 and disconnected the connection, but left it
in self._connections. A later reset() or __del__ computed in_use_count
from len(self._connections) and decremented USED again, drifting the
db.client.connection.count metric negative.

Remove the connection from self._connections in the except Full handler
so it is no longer counted as in-use. Add a regression test asserting a
subsequent reset() does not double-decrement USED.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants