Bug description
statistics.quantiles(data, n=N, method='exclusive') with N ≥ 3 returns cut points that are not monotonically non-decreasing when data contains two equal but float-inexact values. The first and last cut points compute one ULP below the data value, while interior cut points compute exactly the data value, so the returned list is unsorted.
import statistics
# Two copies of float(pi)
data = [3.141592653589793, 3.141592653589793]
result = statistics.quantiles(data, n=9, method='exclusive')
print(result)
# [3.1415926535897927, 3.141592653589793, 3.141592653589793,
# 3.141592653589793, 3.141592653589793, 3.141592653589793,
# 3.141592653589793, 3.1415926535897927]
print(result == sorted(result))
# False <-- cut points 0 and 7 are < cut points 1..6
Reproducer is one-liner:
>>> import statistics
>>> r = statistics.quantiles([3.141592653589793]*2, n=9, method='exclusive')
>>> r[0] <= r[1]
False
Why this is a bug
The quantiles documentation says it "Divide(s) data into n continuous intervals with equal probability. Returns a list of n - 1 cut points separating the intervals." Cut points that separate intervals must be monotonically non-decreasing — otherwise the "intervals" they delimit overlap or invert. Several downstream uses (e.g. bisect.bisect_left(cut_points, x) to bucket a value) rely silently on this invariant and will return wrong bucket indices when it's violated.
The 'exclusive' method computes each cut point with the formula:
# Lib/statistics.py
m = ld - 1
result = []
for i in range(1, n):
j = i * m // n
delta = i * m - j * n
interpolated = (data[j-1] * (n - delta) + data[j] * delta) / n
result.append(interpolated)
When data[j-1] == data[j] == x exactly, interpolated = (x*(n-delta) + x*delta)/n should be x mathematically, but for non-power-of-two delta/n ratios the float operations x*(n-delta) + x*delta do not always round to n*x, yielding a result one ULP off. The error happens asymmetrically across i, so adjacent cut points differ by 1 ULP and the list ends up unsorted.
'inclusive' is not affected because its formula is structured to be exact when data are identical.
Suggested fix
Sort the result before returning (cheap, O(n log n) on n − 1 elements which is already trivial):
--- a/Lib/statistics.py
+++ b/Lib/statistics.py
@@ -X,Y +X,Y @@ def quantiles(data, *, n=4, method='exclusive'):
for i in range(1, n):
j = i * m // n # rescale i to m/n
delta = i*m - j*n # exact remainder
interpolated = (data[j-1] * (n - delta) + data[j] * delta) / n
result.append(interpolated)
- return result
+ # The interpolation above can produce float-rounding artifacts that
+ # leave neighbouring identical-data cut points one ULP out of order
+ # (e.g. data=[pi, pi], n=9). Restore the documented monotonic-
+ # non-decreasing invariant before returning.
+ result.sort()
+ return result
This preserves the existing values exactly when they are already monotonic and only reorders the few corner cases where float rounding broke the invariant. A regression test:
import statistics
def test_quantiles_monotonic_with_duplicates():
for x in [3.141592653589793, 1/3, 0.1, 1e300, 1e-300]:
for n in range(2, 20):
for method in ('exclusive', 'inclusive'):
r = statistics.quantiles([x]*2, n=n, method=method)
assert r == sorted(r), (x, n, method, r)
CPython versions tested on
3.9, 3.12, 3.14
Operating systems tested on
macOS
Linked PRs
Bug description
statistics.quantiles(data, n=N, method='exclusive')withN≥ 3 returns cut points that are not monotonically non-decreasing whendatacontains two equal but float-inexact values. The first and last cut points compute one ULP below the data value, while interior cut points compute exactly the data value, so the returned list is unsorted.Reproducer is one-liner:
Why this is a bug
The
quantilesdocumentation says it "Divide(s) data into n continuous intervals with equal probability. Returns a list ofn - 1cut points separating the intervals." Cut points that separate intervals must be monotonically non-decreasing — otherwise the "intervals" they delimit overlap or invert. Several downstream uses (e.g.bisect.bisect_left(cut_points, x)to bucket a value) rely silently on this invariant and will return wrong bucket indices when it's violated.The
'exclusive'method computes each cut point with the formula:When
data[j-1] == data[j] == xexactly,interpolated = (x*(n-delta) + x*delta)/nshould bexmathematically, but for non-power-of-twodelta/nratios the float operationsx*(n-delta) + x*deltado not always round ton*x, yielding a result one ULP off. The error happens asymmetrically acrossi, so adjacent cut points differ by 1 ULP and the list ends up unsorted.'inclusive'is not affected because its formula is structured to be exact when data are identical.Suggested fix
Sort the result before returning (cheap, O(n log n) on n − 1 elements which is already trivial):
This preserves the existing values exactly when they are already monotonic and only reorders the few corner cases where float rounding broke the invariant. A regression test:
CPython versions tested on
3.9, 3.12, 3.14
Operating systems tested on
macOS
Linked PRs
quantiles(method='exclusive')returning unsorted cut points for duplicate floats #150326