Skip to content

time_bucket() silently dropped from projections after nodedb-sql migration #17

@emanzx

Description

@emanzx

Split out from #15 (numeric aggregate fix landed in PR #16, this part remains).

Build: af9a3ae on main.

time_bucket() is not being executed by the new nodedb-sql planner — function calls are silently elided rather than producing an error. All variants return either an empty result (with GROUP BY) or fall through to SELECT * (without GROUP BY).

Reproduction

rm -rf ~/.local/share/nodedb
NODEDB_PORT_ILP=8086 ./target/release/nodedb &
sleep 4
curl -s -X POST http://127.0.0.1:6480/query -H 'Content-Type: application/json' \
  -d '{"sql":"CREATE TIMESERIES dns_bench"}'

Send 100k rows spanning ~100 seconds (so a 1-hour bucket should trivially yield exactly one row of count = 100000):

import socket, random
from datetime import datetime, timezone
rng = random.Random(42)
ts = int(datetime.now(timezone.utc).timestamp() * 1e9)
for _ in range(20):
    lines = []
    for _ in range(5000):
        cip = "10.{}.{}.{}".format(rng.randint(0,255), rng.randint(0,255), rng.randint(1,254))
        lines.append("dns_bench,client_ip={},qtype=A,rcode=NOERROR,cached=true,qname=example.com elapsed_ms=0.5 {}".format(cip, ts))
        ts += 1_000_000
    s = socket.socket(); s.connect(("127.0.0.1",8086))
    s.sendall(("\n".join(lines)+"\n").encode())
    s.shutdown(socket.SHUT_WR)
    try: s.recv(1024)
    except: pass
    s.close()
Q() { curl -s -X POST http://127.0.0.1:6480/query -H 'Content-Type: application/json' -d "{\"sql\":\"$1\"}"; echo; }

Q "SELECT time_bucket('1 hour', timestamp) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket(timestamp, '1 hour') AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket(3600, timestamp) AS b, COUNT(*) FROM dns_bench GROUP BY b"
Q "SELECT time_bucket('5 minutes', timestamp) AS b, qtype, COUNT(*) FROM dns_bench GROUP BY b, qtype"
Q "SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3"

Actual

SELECT time_bucket('1 hour', timestamp) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY

SELECT time_bucket(timestamp, '1 hour') AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (arg-order swap)

SELECT time_bucket('1 hour', CAST(timestamp AS BIGINT)) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (BIGINT cast)

SELECT time_bucket(3600, timestamp) AS b, COUNT(*) ... GROUP BY b
  -> {"rows":[[]], "status":"ok"}    EMPTY (int seconds literal)

SELECT time_bucket('5 minutes', timestamp), qtype, COUNT(*) ... GROUP BY b, qtype
  -> {"rows":[[]], "status":"ok"}    EMPTY (multi-key group)

SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3
  -> {"rows":[[
       {"cached":"true","client_ip":"10.57.12.190","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707091},
       {"cached":"true","client_ip":"10.140.125.58","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707092},
       {"cached":"true","client_ip":"10.71.52.174","elapsed_ms":0.5,"qname":"example.com",
        "qtype":"A","rcode":"NOERROR","timestamp":1775602707093}
     ]], "status":"ok"}

The last query is the smoking gun: the projection time_bucket('1 hour', timestamp) is silently dropped, the planner falls through to SELECT *, and the original row schema leaks back. No time_bucket(...) column appears in the output, and there is no error returned. This strongly suggests the function is unregistered in the nodedb-sql function table and the call is being elided rather than rejected.

Expected

SELECT time_bucket('1 hour', timestamp), COUNT(*) FROM dns_bench GROUP BY 1
  -> exactly one row, count_all = 100000
SELECT time_bucket('5 minutes', timestamp), COUNT(*) FROM dns_bench GROUP BY 1
  -> ~2 rows (data spans ~100s, may straddle a 5-min boundary)
SELECT time_bucket('1 hour', timestamp) FROM dns_bench LIMIT 3
  -> 3 rows of a single column named like `time_bucket('1 hour', timestamp)` or aliased

Notes

  • Same behavior in b0d0f68 for the literal-arg form, but in that build the CAST(timestamp AS BIGINT) workaround did return a valid bucket. After the nodedb-sql migration even the cast workaround is broken — nothing routes through. So this is a regression of an already-fragile path; the previous DataFusion-backed time_bucket(Utf8, Int64) registration is gone and nothing replaces it.
  • Related: any benchmark or dashboard query that buckets by time will hit this. It is the only query family in our 10-query DNS-telemetry benchmark suite that still fails after PR fix: resolve null aggregates and broken time_bucket after SQL migration #16; everything else (COUNT, SELECT *, WHERE, GROUP BY tag, MIN/MAX/SUM/AVG) works against ILP-ingested timeseries collections.
  • Suggestion: when the function is registered in nodedb-sql, add a regression test asserting both the GROUP BY shape and the bare-projection shape (SELECT time_bucket(...) FROM t LIMIT 1) so a future silent-drop regression can't recur.

Build: af9a3ae on main. Linux x86_64, release build.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions