⚡️ Speed up function _mean by 413%
#277
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 413% (4.13x) speedup for
_meaninunstructured/metrics/utils.py⏱️ Runtime :
3.35 milliseconds→654 microseconds(best of181runs)📝 Explanation and details
The optimized code achieves a 412% speedup by replacing the
statistics.mean()call with native arithmetic operations, while carefully preserving the original behavior regarding type handling and NaN propagation.Key Optimization
The primary performance bottleneck was
statistics.mean(), which consumed 97.4% of the original runtime (12.26ms out of 12.59ms total). The optimized version eliminates this overhead by:.mean(skipna=False)method, which is already optimized in pandas/NumPystatistics.mean()with simplesum(scores) / len(scores)arithmeticWhy This Works
The
statistics.mean()function from Python's standard library performs extensive type checking, validation, and precision handling to work correctly with various numeric types. For the common case of lists with basic numeric types, this overhead is unnecessary. Direct arithmetic operations (sum()and division) are much faster.Performance by Test Case Type
.mean()is already optimized, but the type conversion overhead (item()call) adds a small cost.Behavioral Preservation
The optimization carefully maintains compatibility:
skipna=Falseensures NaN values propagate (matchingstatistics.mean()behavior).item()converts NumPy scalars to native Python typesintinstead offloat(matching original behavior)Impact Assessment
This optimization is most beneficial for workloads that frequently call
_mean()with moderately-sized lists (10-1000 elements) in performance-critical paths. The slight regression for pandas Series inputs is acceptable given the massive gains for list inputs, which appear to be the primary use case based on test coverage.✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
metrics/test_utils.py::test_stats🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-_mean-mks5352kand push.