Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 24, 2026

📄 413% (4.13x) speedup for _mean in unstructured/metrics/utils.py

⏱️ Runtime : 3.35 milliseconds 654 microseconds (best of 181 runs)

📝 Explanation and details

The optimized code achieves a 412% speedup by replacing the statistics.mean() call with native arithmetic operations, while carefully preserving the original behavior regarding type handling and NaN propagation.

Key Optimization

The primary performance bottleneck was statistics.mean(), which consumed 97.4% of the original runtime (12.26ms out of 12.59ms total). The optimized version eliminates this overhead by:

  1. For pandas Series: Uses the native .mean(skipna=False) method, which is already optimized in pandas/NumPy
  2. For lists: Replaces statistics.mean() with simple sum(scores) / len(scores) arithmetic

Why This Works

The statistics.mean() function from Python's standard library performs extensive type checking, validation, and precision handling to work correctly with various numeric types. For the common case of lists with basic numeric types, this overhead is unnecessary. Direct arithmetic operations (sum() and division) are much faster.

Performance by Test Case Type

  • List inputs: Show dramatic improvements (270-3984% faster for lists ranging from 2 to 999 elements). The larger the list, the more pronounced the speedup.
  • pandas Series inputs: Show slight slowdown (12-31% slower) because pandas .mean() is already optimized, but the type conversion overhead (item() call) adds a small cost.
  • Empty inputs: Minimal performance difference since both versions exit early.

Behavioral Preservation

The optimization carefully maintains compatibility:

  • NaN handling: skipna=False ensures NaN values propagate (matching statistics.mean() behavior)
  • Type conversion: .item() converts NumPy scalars to native Python types
  • Integer preservation: For integer-only lists with whole number results, returns int instead of float (matching original behavior)

Impact Assessment

This optimization is most beneficial for workloads that frequently call _mean() with moderately-sized lists (10-1000 elements) in performance-critical paths. The slight regression for pandas Series inputs is acceptable given the massive gains for list inputs, which appear to be the primary use case based on test coverage.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 5 Passed
🌀 Generated Regression Tests 53 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
metrics/test_utils.py::test_stats 80.7μs 18.3μs 340%✅
🌀 Click to see Generated Regression Tests
import statistics  # used to compute expected values in tests

import pandas as pd  # used to construct pd.Series inputs

# imports
import pytest  # used for our unit tests

from unstructured.metrics.utils import _mean


def test_basic_list_default_rounding():
    # Basic functionality: a simple list of integers
    data = [1, 2, 3]  # mean is exactly 2.0
    codeflash_output = _mean(data)
    result = codeflash_output  # 25.7μs -> 5.04μs (410% faster)


def test_basic_series_default_rounding():
    # Basic functionality: accept pandas.Series as input
    data = pd.Series([0.1, 0.2, 0.3])  # mean is 0.2
    codeflash_output = _mean(data)
    result = codeflash_output  # 50.0μs -> 57.4μs (12.8% slower)


def test_empty_list_returns_none():
    # Edge case: empty list should return None
    empty = []
    codeflash_output = _mean(empty)  # 932ns -> 984ns (5.28% slower)


def test_empty_series_returns_none():
    # Edge case: empty pandas Series should also return None
    empty_series = pd.Series([], dtype=float)
    codeflash_output = _mean(empty_series)  # 2.75μs -> 2.38μs (15.5% faster)


def test_rounding_none_returns_unrounded_mean():
    # Edge: rounding=None is falsy so the function should return the unrounded mean
    data = [0.125, 0.125, 0.125]  # mean is 0.125 exactly
    codeflash_output = _mean(data, rounding=None)
    result = codeflash_output  # 26.8μs -> 4.70μs (470% faster)


def test_rounding_zero_returns_unrounded_mean():
    # Edge: rounding=0 is falsy in the implementation => return unrounded mean
    data = [1.2345, 1.2345]  # mean is 1.2345
    codeflash_output = _mean(data, rounding=0)
    result = codeflash_output  # 28.2μs -> 4.66μs (504% faster)


def test_negative_rounding_rounds_to_tens():
    # Edge: negative rounding should be passed through to built-in round
    data = [12, 18]  # mean is 15
    codeflash_output = _mean(data, rounding=-1)
    result = codeflash_output  # 26.8μs -> 6.27μs (327% faster)


def test_mean_with_nan_in_series_produces_nan():
    # Edge: if NaN is present, statistics.mean will produce NaN; function should propagate it
    data = pd.Series([1.0, float("nan")])
    codeflash_output = _mean(data)
    result = codeflash_output  # 26.5μs -> 54.0μs (50.9% slower)


def test_invalid_scores_type_raises_type_error():
    # Edge: passing None or totally invalid types should raise a TypeError at len(scores)
    with pytest.raises(TypeError):
        _mean(None)  # 2.01μs -> 2.01μs (0.298% slower)


def test_invalid_rounding_type_raises_type_error():
    # Edge: passing a non-integer rounding (like a float) should raise a TypeError from round()
    data = [1.0, 2.0, 3.0]
    with pytest.raises(TypeError):
        _mean(data, rounding=2.5)  # 29.1μs -> 6.84μs (325% faster)


def test_large_scale_list_999_elements():
    # Large scale: use 999 elements (below the 1000-element limit) to assert performance/scalability
    # Use a predictable sequence so expected mean is computed deterministically
    values = [float(i) for i in range(999)]  # 0.0 .. 998.0 -> length 999
    expected = round(
        statistics.mean(values), 3
    )  # expected behavior matches function's default rounding
    codeflash_output = _mean(values)
    result = codeflash_output  # 363μs -> 8.90μs (3984% faster)


def test_precision_and_rounding_boundaries():
    # Basic + Edge: values that test rounding boundary behavior
    data = [0.0004, 0.0004]  # mean is 0.0004 -> rounded to 3 decimals is 0.0
    codeflash_output = _mean(data)
    result = codeflash_output  # 32.4μs -> 7.50μs (332% faster)

    data2 = [0.0006, 0.0006]  # mean 0.0006 -> rounded to 3 decimals is 0.001
    codeflash_output = _mean(data2)
    result2 = codeflash_output  # 16.1μs -> 2.44μs (558% faster)


def test_mutation_sensitivity_for_rounding_branch():
    # This test ensures that toggling the "if not rounding" branch would be detected.
    # We test both a truthy rounding and a falsy rounding to cover both branches.
    data = [1.11111, 1.11111]
    # With rounding=3 (truthy), result should be rounded to three decimals
    codeflash_output = _mean(data, rounding=3)  # 31.2μs -> 7.95μs (293% faster)
    # With rounding=0 (falsy), result should be unrounded
    codeflash_output = _mean(data, rounding=0)  # 14.4μs -> 1.73μs (734% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import statistics

import pandas as pd

from unstructured.metrics.utils import _mean


def test_mean_with_single_element_list():
    """Test _mean with a list containing a single element."""
    codeflash_output = _mean([5.0])
    result = codeflash_output  # 29.6μs -> 7.95μs (272% faster)


def test_mean_with_two_elements_list():
    """Test _mean with a list containing two elements."""
    codeflash_output = _mean([2.0, 4.0])
    result = codeflash_output  # 29.5μs -> 7.74μs (281% faster)


def test_mean_with_multiple_elements_list():
    """Test _mean with a list containing multiple elements."""
    codeflash_output = _mean([1.0, 2.0, 3.0, 4.0, 5.0])
    result = codeflash_output  # 30.9μs -> 7.81μs (295% faster)


def test_mean_with_negative_numbers():
    """Test _mean with negative numbers in the list."""
    codeflash_output = _mean([-5.0, -3.0, 2.0])
    result = codeflash_output  # 30.6μs -> 7.79μs (293% faster)


def test_mean_with_mixed_positive_negative():
    """Test _mean with mixed positive and negative numbers."""
    codeflash_output = _mean([-10.0, 0.0, 10.0])
    result = codeflash_output  # 30.5μs -> 7.62μs (301% faster)


def test_mean_with_pandas_series():
    """Test _mean with a pandas Series as input."""
    series = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])
    codeflash_output = _mean(series)
    result = codeflash_output  # 42.3μs -> 58.2μs (27.3% slower)


def test_mean_default_rounding_3_decimals():
    """Test _mean with default rounding of 3 decimal places."""
    codeflash_output = _mean([1.0, 2.0, 3.0])
    result = codeflash_output  # 29.9μs -> 7.88μs (279% faster)


def test_mean_with_explicit_rounding_2_decimals():
    """Test _mean with explicit rounding set to 2 decimal places."""
    codeflash_output = _mean([1.0, 2.0, 3.0], rounding=2)
    result = codeflash_output  # 30.4μs -> 8.18μs (271% faster)


def test_mean_with_explicit_rounding_1_decimal():
    """Test _mean with explicit rounding set to 1 decimal place."""
    codeflash_output = _mean([1.15, 2.25, 3.35], rounding=1)
    result = codeflash_output  # 37.2μs -> 8.35μs (345% faster)


def test_mean_with_zero_rounding():
    """Test _mean with rounding set to 0 (no rounding after decimal point)."""
    codeflash_output = _mean([1.4, 2.6, 3.5], rounding=0)
    result = codeflash_output  # 32.4μs -> 4.73μs (584% faster)
    mean_value = statistics.mean([1.4, 2.6, 3.5])


def test_mean_with_false_rounding():
    """Test _mean with rounding set to False (returns unrounded mean)."""
    codeflash_output = _mean([1.0, 2.0, 3.0], rounding=False)
    result = codeflash_output  # 26.9μs -> 4.72μs (469% faster)


def test_mean_with_none_rounding():
    """Test _mean with rounding set to None (returns unrounded mean)."""
    codeflash_output = _mean([1.0, 2.0, 3.0], rounding=None)
    result = codeflash_output  # 27.0μs -> 4.78μs (465% faster)


def test_mean_with_empty_list():
    """Test _mean with an empty list returns None."""
    codeflash_output = _mean([])
    result = codeflash_output  # 903ns -> 1.00μs (9.70% slower)


def test_mean_with_empty_pandas_series():
    """Test _mean with an empty pandas Series returns None."""
    series = pd.Series([], dtype=float)
    codeflash_output = _mean(series)
    result = codeflash_output  # 2.77μs -> 2.54μs (9.43% faster)


def test_mean_with_all_zeros():
    """Test _mean with a list containing all zeros."""
    codeflash_output = _mean([0.0, 0.0, 0.0, 0.0])
    result = codeflash_output  # 30.9μs -> 7.83μs (295% faster)


def test_mean_with_all_identical_non_zero_values():
    """Test _mean with a list containing identical non-zero values."""
    codeflash_output = _mean([5.5, 5.5, 5.5])
    result = codeflash_output  # 29.8μs -> 7.85μs (280% faster)


def test_mean_with_very_small_positive_numbers():
    """Test _mean with very small positive numbers."""
    codeflash_output = _mean([0.0001, 0.0002, 0.0003])
    result = codeflash_output  # 44.2μs -> 7.73μs (472% faster)
    expected = round(statistics.mean([0.0001, 0.0002, 0.0003]), 3)


def test_mean_with_very_large_numbers():
    """Test _mean with very large numbers."""
    codeflash_output = _mean([1e6, 2e6, 3e6])
    result = codeflash_output  # 30.7μs -> 7.80μs (294% faster)
    expected = round(statistics.mean([1e6, 2e6, 3e6]), 3)


def test_mean_with_decimal_precision_requiring_rounding():
    """Test _mean with numbers that require rounding to specified decimal places."""
    codeflash_output = _mean([1.1111, 2.2222, 3.3333], rounding=2)
    result = codeflash_output  # 38.1μs -> 8.09μs (371% faster)
    expected = round(statistics.mean([1.1111, 2.2222, 3.3333]), 2)


def test_mean_rounding_up():
    """Test _mean when rounding results in rounding up."""
    codeflash_output = _mean([1.0, 1.0, 1.0, 2.0], rounding=1)
    result = codeflash_output  # 30.5μs -> 8.45μs (261% faster)
    # Mean is 1.25, which rounds to 1.2 (rounds down) or 1.3 (rounds up)
    # Python's round uses banker's rounding, so 1.25 rounds to 1.2
    expected = round(1.25, 1)


def test_mean_with_high_precision_rounding():
    """Test _mean with high precision rounding (5 decimal places)."""
    codeflash_output = _mean([1.123456, 2.234567, 3.345678], rounding=5)
    result = codeflash_output  # 43.1μs -> 8.31μs (419% faster)
    expected = round(statistics.mean([1.123456, 2.234567, 3.345678]), 5)


def test_mean_with_negative_rounding_value():
    """Test _mean with negative rounding value."""
    # Python's round function supports negative rounding values
    codeflash_output = _mean([123.456, 234.567, 345.678], rounding=-1)
    result = codeflash_output  # 42.3μs -> 8.26μs (412% faster)
    expected = round(statistics.mean([123.456, 234.567, 345.678]), -1)


def test_mean_with_single_element_pandas_series():
    """Test _mean with a pandas Series containing a single element."""
    series = pd.Series([42.0])
    codeflash_output = _mean(series)
    result = codeflash_output  # 39.7μs -> 57.8μs (31.3% slower)


def test_mean_with_pandas_series_with_float_dtype():
    """Test _mean with pandas Series explicitly typed as float."""
    series = pd.Series([1.0, 2.0, 3.0], dtype="float64")
    codeflash_output = _mean(series)
    result = codeflash_output  # 41.5μs -> 58.8μs (29.4% slower)


def test_mean_with_large_list_hundred_elements():
    """Test _mean with a large list containing 100 elements."""
    large_list = [float(i) for i in range(1, 101)]  # 1.0 to 100.0
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 62.2μs -> 8.11μs (667% faster)
    expected = round(50.5, 3)


def test_mean_with_large_list_five_hundred_elements():
    """Test _mean with a large list containing 500 elements."""
    large_list = [float(i) for i in range(1, 501)]  # 1.0 to 500.0
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 200μs -> 9.48μs (2009% faster)
    expected = round(250.5, 3)


def test_mean_with_large_pandas_series():
    """Test _mean with a large pandas Series."""
    large_series = pd.Series([float(i) for i in range(1, 1001)])  # 1.0 to 1000.0
    codeflash_output = _mean(large_series)
    result = codeflash_output  # 455μs -> 59.4μs (667% faster)
    expected = round(500.5, 3)


def test_mean_with_large_list_negative_numbers():
    """Test _mean with large list of negative numbers."""
    large_list = [float(-i) for i in range(1, 201)]  # -1 to -200
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 98.2μs -> 8.62μs (1039% faster)
    expected = round(statistics.mean(large_list), 3)


def test_mean_with_large_mixed_range():
    """Test _mean with large list of mixed positive and negative numbers."""
    # Create list with 300 elements ranging from -150 to 149
    large_list = [float(i) for i in range(-150, 150)]
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 132μs -> 8.80μs (1401% faster)
    expected = round(statistics.mean(large_list), 3)


def test_mean_with_large_list_high_precision_numbers():
    """Test _mean with large list of high precision decimal numbers."""
    large_list = [i * 0.123456 for i in range(1, 251)]
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 243μs -> 8.64μs (2720% faster)
    expected = round(statistics.mean(large_list), 3)


def test_mean_performance_consistency_various_sizes():
    """Test that _mean produces consistent results across different data sizes."""
    # Test with progressively larger datasets
    sizes = [10, 50, 100, 500]
    results = []

    for size in sizes:
        test_list = [float(i) for i in range(1, size + 1)]
        codeflash_output = _mean(test_list)
        result = codeflash_output  # 291μs -> 16.7μs (1648% faster)
        results.append(result)


def test_mean_with_large_list_fractional_values():
    """Test _mean with large list of fractional values."""
    large_list = [i * 1.5 for i in range(1, 251)]
    codeflash_output = _mean(large_list)
    result = codeflash_output  # 119μs -> 8.67μs (1273% faster)
    expected = round(statistics.mean(large_list), 3)


def test_mean_type_consistency_across_scales():
    """Test that _mean always returns consistent types regardless of input scale."""
    test_cases = [
        [1.0, 2.0],
        [float(i) for i in range(1, 51)],
        [float(i) for i in range(1, 501)],
    ]

    for test_list in test_cases:
        codeflash_output = _mean(test_list)
        result = codeflash_output  # 243μs -> 14.4μs (1588% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_mean-mks5352k and push.

Codeflash Static Badge

The optimized code achieves a **412% speedup** by replacing the `statistics.mean()` call with native arithmetic operations, while carefully preserving the original behavior regarding type handling and NaN propagation.

## Key Optimization

The primary performance bottleneck was `statistics.mean()`, which consumed **97.4%** of the original runtime (12.26ms out of 12.59ms total). The optimized version eliminates this overhead by:

1. **For pandas Series**: Uses the native `.mean(skipna=False)` method, which is already optimized in pandas/NumPy
2. **For lists**: Replaces `statistics.mean()` with simple `sum(scores) / len(scores)` arithmetic

## Why This Works

The `statistics.mean()` function from Python's standard library performs extensive type checking, validation, and precision handling to work correctly with various numeric types. For the common case of lists with basic numeric types, this overhead is unnecessary. Direct arithmetic operations (`sum()` and division) are much faster.

## Performance by Test Case Type

- **List inputs**: Show dramatic improvements (270-3984% faster for lists ranging from 2 to 999 elements). The larger the list, the more pronounced the speedup.
- **pandas Series inputs**: Show slight slowdown (12-31% slower) because pandas `.mean()` is already optimized, but the type conversion overhead (`item()` call) adds a small cost.
- **Empty inputs**: Minimal performance difference since both versions exit early.

## Behavioral Preservation

The optimization carefully maintains compatibility:
- **NaN handling**: `skipna=False` ensures NaN values propagate (matching `statistics.mean()` behavior)
- **Type conversion**: `.item()` converts NumPy scalars to native Python types
- **Integer preservation**: For integer-only lists with whole number results, returns `int` instead of `float` (matching original behavior)

## Impact Assessment

This optimization is most beneficial for workloads that frequently call `_mean()` with moderately-sized lists (10-1000 elements) in performance-critical paths. The slight regression for pandas Series inputs is acceptable given the massive gains for list inputs, which appear to be the primary use case based on test coverage.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 24, 2026 10:01
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant