Skip to content

Numeric aggregates (MIN/MAX/SUM/AVG) return null and time_bucket() returns empty after nodedb-sql migration #15

@emanzx

Description

@emanzx

After the nodedb-sql migration (commits e259e65..8588cf8 on main), filter & GROUP BY queries against ILP-ingested timeseries collections now work — that's a great fix on top of the schema regression that existed in b0d0f68. But the new query path has two functional regressions:

  1. MIN/MAX/SUM/AVG over numeric columns return null for both Float64 and Int64 columns.
  2. time_bucket('1 hour', timestamp) returns an empty result set (no rows, not an error).

COUNT(*) and COUNT(col) work. Filtering on numeric columns works (WHERE elapsed_ms > 0). Projecting numeric columns works (SELECT elapsed_ms ...). Only aggregation over them fails.

Reproduction

Build: 8588cf8 (upstream/main as of testing).

rm -rf ~/.local/share/nodedb
NODEDB_PORT_ILP=8086 ./target/release/nodedb &
sleep 4

# Create a timeseries collection
curl -s -X POST http://127.0.0.1:6480/query \
  -H 'Content-Type: application/json' \
  -d '{"sql":"CREATE TIMESERIES dns_bench"}'

Send 100k ILP rows with a constant elapsed_ms=0.5 (so any aggregate is trivially predictable):

import socket, random
from datetime import datetime, timezone
rng = random.Random(42)
ts = int(datetime.now(timezone.utc).timestamp() * 1e9)
for _ in range(20):
    lines = []
    for _ in range(5000):
        cip = "10.{}.{}.{}".format(rng.randint(0,255), rng.randint(0,255), rng.randint(1,254))
        lines.append("dns_bench,client_ip={},qtype=A,rcode=NOERROR,cached=true,qname=example.com elapsed_ms=0.5 {}".format(cip, ts))
        ts += 1_000_000
    s = socket.socket(); s.connect(("127.0.0.1",8086))
    s.sendall(("\n".join(lines)+"\n").encode())
    s.shutdown(socket.SHUT_WR)
    try: s.recv(1024)
    except: pass
    s.close()

Then query:

Q() { curl -s -X POST http://127.0.0.1:6480/query -H 'Content-Type: application/json' -d "{\"sql\":\"$1\"}"; echo; }

Q "SELECT COUNT(*) FROM dns_bench"
Q "SELECT COUNT(elapsed_ms) FROM dns_bench"
Q "SELECT elapsed_ms FROM dns_bench LIMIT 1"
Q "SELECT MIN(elapsed_ms), MAX(elapsed_ms), SUM(elapsed_ms), AVG(elapsed_ms) FROM dns_bench"
Q "SELECT MIN(timestamp), MAX(timestamp) FROM dns_bench"
Q "SELECT COUNT(*) FROM dns_bench WHERE elapsed_ms > 0"
Q "SELECT time_bucket('1 hour', timestamp) AS b, COUNT(*) FROM dns_bench GROUP BY b ORDER BY b"
Q "SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) AS b, COUNT(*) FROM dns_bench GROUP BY b ORDER BY b"

Actual results

SELECT COUNT(*) FROM dns_bench
  -> {"rows":[[{"count_COUNT(all)":100000}]],"status":"ok"}             OK

SELECT COUNT(elapsed_ms) FROM dns_bench
  -> {"rows":[[{"count_COUNT(elapsed_ms)":100000}]],"status":"ok"}      OK (column is readable)

SELECT elapsed_ms FROM dns_bench LIMIT 1
  -> {"rows":[[{... "elapsed_ms":0.5 ...}]],"status":"ok"}              OK

SELECT MIN(elapsed_ms), MAX(elapsed_ms), SUM(elapsed_ms), AVG(elapsed_ms) FROM dns_bench
  -> {"rows":[[{"min_MIN(elapsed_ms)":null,
                "max_MAX(elapsed_ms)":null,
                "sum_SUM(elapsed_ms)":null,
                "avg_AVG(elapsed_ms)":null}]],"status":"ok"}            BUG #1 (Float64)

SELECT MIN(timestamp), MAX(timestamp) FROM dns_bench
  -> {"rows":[[{"min_MIN(timestamp)":null,
                "max_MAX(timestamp)":null}]],"status":"ok"}              BUG #1 (Int64)

SELECT COUNT(*) FROM dns_bench WHERE elapsed_ms > 0
  -> {"rows":[[{"count_COUNT(all)":100000}]],"status":"ok"}             OK (filter works)

SELECT time_bucket('1 hour', timestamp) ... GROUP BY b
  -> {"rows":[[]],"status":"ok"}                                        BUG #2

SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) ... GROUP BY b
  -> {"rows":[[]],"status":"ok"}                                        BUG #2 (cast doesn't help)

Expected

  • MIN(elapsed_ms) = MAX(elapsed_ms) = AVG(elapsed_ms) = 0.5
  • SUM(elapsed_ms) = 50000.0
  • MIN(timestamp) / MAX(timestamp) = the ingest range
  • time_bucket(...) returns at least one bucket grouping the 100k rows

Notes

  • All four aggregates fail symmetrically and the column type doesn't matter (Float64 vs Int64), so this looks like a single missing branch in the aggregator dispatch in the new nodedb-sql planner/executor — not separate per-function bugs.
  • time_bucket() failing for both Timestamp(µs) and Int64 arguments suggests the function is either unregistered in the new SQL crate or returns an empty group key. In the previous build (b0d0f68) the BIGINT cast workaround returned a valid bucket, so this is a regression of the workaround as well.
  • Cosmetic side note: aggregate result columns come back as count_COUNT(all) / avg_AVG(elapsed_ms). Probably the projection alias is being concatenated with the function call source text. Easy to overlook but trips up clients that key on column names.

What works (good news from this build)

  • ILP ingest 76k+/s sustained, 145MB RSS for 10M rows, 100% WAL visibility — no stalls.
  • Schema discovery for ILP-added tag/field columns works in WHERE/GROUP BY (the regression filed against b0d0f68 is gone).
  • COUNT(*), COUNT(col), SELECT col, WHERE col op val, GROUP BY tag all work end-to-end.

Build: 8588cf8 on main. Platform: Linux x86_64, release build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions