Cluster code micro-optimization#4094
Conversation
A bunch of low-hanging, low-impact micro-optimizations. Mostly common subexpression extraction to local variables when beneficial. Barely noticeable benchmark impact but technically an improvement without readability impact IMO.
Concentrate around key extraction and node determination. Optimizes the most common path for key extraction which is that of single-key commands.
Common subexpression factorization
Getting the policy callback out of all the dicts is very indirect and wasteful for simple commands. Cache simple commands so the next time it only takes a single lookup. About 3.7% improvement on overall benchmark.
Avoid resolving the same command policies over and over. All resolvers are "static" in the short term, during a single pipeline execution, so cache their results for the duration to reduce the amount of work done.
Simple commands have simple responses, so a fast path for those improves response parsing speed in the average
Another big speedup in command policy resolution
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
Were meant for pre-PR testing only
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Reviewed by Cursor Bugbot for commit 99df020. Configure here.
| if isinstance(value, (bytes, bytearray, memoryview)): | ||
| return value | ||
| elif isinstance(value, str): | ||
| return value.encode(self.encoding, self.encoding_errors) |
There was a problem hiding this comment.
Encoder str check now precedes bool check incorrectly
Low Severity
The reordering moves the isinstance(value, str) check before the isinstance(value, bool) check. While bool is not a subclass of str in standard Python so current behavior is preserved, the original ordering had a clear comment explaining the bool check exists specifically as a guard before int. Moving str above bool obscures this intent and makes the code fragile — any future custom type inheriting both str and bool-like behavior could break silently. The comment "special case bool since it is a subclass of int" is now misleadingly placed after str, not before int.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 99df020. Configure here.
There was a problem hiding this comment.
Not really, the concern for bool was around int, not str. If it were, bytes would have a similar problem. That behavior is preserved. String and byte values are by far the most commonly used so they should be checked first.
|
Hi @klaussfreire, thank you for your contribution! I'll take a look at it shortly. |


Description of change
A bunch of low-hanging micro-optimizations.
Mostly common subexpression extraction to local variables when beneficial. Barely noticeable benchmark impact but technically an improvement without readability impact IMO.
One of the most impactful changes concentrates around key extraction and node determination.
It optimizes the most common path for key extraction which is that of single-key commands,
precomputing key position during command parsing to have a simple fast path in get_keys.
This last one does produce a large impact in cluster benchmarks, of around 20%.
Some optimizations also apply to regular (non-cluster) clients, but the main concern of this optimization effort was around cluster ops, so they were not benchmarked.
Benchmark results
This uses a benchmark similar to those in basic_operations, but with a cluster client.
Two pipeline patterns were used:
Both patterns include value payloads of around 50 bytes each.
The first one stresses multiple command types with various return types, while the second one stresses a streamlined single-command pipeline with limited response data.
The benchmark was run in AWS against a 3-node valkey EC cluster (EC versions of Redis OSS does not support setex), using hiredis 3.3.1. The setup builds a mostly CPU-bound case with a very fast but still realistic network between client and cluster.
Large pipelines show as dnf because they get deadlocked, something that will be covered on another PR.
Numbers are given in requests per second (RPS) and relative uplift.
Pull Request check-list
Please make sure to review and check all of these items:
NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Note
Medium Risk
Changes cluster slot/key routing and pipeline target selection; behavior should be equivalent but mistakes would mis-route commands across shards.
Overview
This PR speeds up Redis Cluster command routing and pipelines, with smaller wins elsewhere in the client stack.
Key extraction precomputes
_single_key_poswhenCOMMANDmetadata is loaded and adds a fast path inget_keys(plus_is_keyed_command) so typical single-key commands skip heavier parsing. Sync and asyncCommandsParsershare the same pattern, with minor refactors for multi-key and negativelast_key_poshandling.Cluster node selection normalizes policy callbacks to a single
(self, command, *args, **kwargs)shape, caches resolved policy callbacks per command (cleared after 5k entries), and reuses keyed/keylessCommandPoliciesin cluster pipeline execution with a per-batch policy cache. Slot routing reuses a localis_readflag.Other hot-path tweaks:
Redis.parse_responseskips option handling whenoptionsis empty;Encoder.encodeuses clearer early returns;bool_oktreats bothb"OK"and"OK"; hiredis parsers cachereader/socklocals; policy resolvers avoid redundant dict lookups; the command packer benchmark joins list payloads beforesendall.Reviewed by Cursor Bugbot for commit 7f51775. Bugbot is set up for automated code reviews on this repo. Configure here.