fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout#8
Conversation
Adds 624 unique test assertions on top of Anton's test/rfl/ corpus, deduped against his parallel rewrite (commit 536fef4 et al). Stats: * 35 modified files — our backup was a strict superset of upstream; replace upstream wholesale. Whitespace-only diffs in arith/abs, arith/ceil, arith/floor, arith/neg, arith/round, cmp/ge, cmp/gt, cmp/ne were also false-positive "Anton-only" results from sorting with extra spaces. * 12 new files — coverage Anton didn't write at all: test/rfl/datalog/rule.rfl — full EAV / rule / TC / negation test/rfl/embedding/hnsw.rfl — HNSW index build/query/info test/rfl/integration/arena.rfl — 100k-element churn + .sys.gc test/rfl/integration/cow.rfl — copy-on-write aliasing test/rfl/integration/morsel.rfl — 1023/1024/1025 + 2047/2048/2049 test/rfl/integration/optimizer.rfl — filter reorder, pushdown, selection-bitmap edges test/rfl/integration/str_pool.rfl — 12-byte SSO boundary test/rfl/system/csv_roundtrip.rfl — schema-less .csv.read test/rfl/system/splayed.rfl — set/get-splayed round-trip test/rfl/table/modify.rfl — functional column update test/rfl/table/pivot.rfl — wide reshape, sum/count aggr test/rfl/table/select.rfl — 50 select-clause assertions Coverage delta vs origin/master: Tests: 921/922 → 933/934 (+12 unique tests) Lines: 63.4% → 63.9% (+217 lines) Functions: 77.5% → 78.0% (+8 functions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously +/-/* and other RAY_BINARY verbs silently truncated extras: (+ 1 2 3) -> 3 (third arg dropped) (+ 1 2 3 4 5) -> 3 (- 10 1 2) -> 9 (* 2 3 4) -> 6 Same hole in RAY_UNARY: extras after the first argument were released without raising. `(+ 1)` and `(+)` were already domain errors but the 3+-arg case fell through to the binary kernel against the first two args and threw the rest away. Now both dispatch sites — the bytecode VM (eval.c:1656) and the tree- walking eval (eval.c:2378) — raise: arity: expected N arg(s), got M `and` and `or` were registered as RAY_BINARY but Anton's tests expected variadic fold semantics ((and a b c) -> fold AND). Switch them to RAY_VARY via ray_and_vary_fn / ray_or_vary_fn — left-fold over the existing binary kernel. The (and X Y) DAG path through select where: clauses still emits OP_AND / OP_OR via the expression compiler, so fused execution is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously (> 'b 'a) raised "error: type: cannot compare sym and sym"
even for the reflexive case (>= 'a 'a). Equality comparisons (==/!=)
worked because they could just check interned IDs, but ordering had no
implementation at all and fell through to the is_numeric() guard.
Add sym_atom_cmp(a, b):
- Fast path: equal interned IDs => identical text (interning gives
one ID per text), return 0 without touching the global sym table.
- Slow path: ray_sym_str(id) returns a RAY_STR atom; ray_str_cmp
delegates to ray_str_t_cmp which uses the 12-byte SSO inline path
for short symbols and prefix-then-fullcompare for pooled ones.
Wire it into ray_gt_fn / ray_lt_fn / ray_gte_fn / ray_lte_fn at the
same dispatch site as the existing -RAY_GUID branch. Vector and
broadcast paths inherit the fix automatically — atomic_map_binary
unboxes SYM elements via collection_elem (-> ray_sym(id) atom), so
each pairwise call lands in the new branch.
The pre-fix probe (test/rfl/cmp/gt.rfl:41-44, "(> 'b 'a) !- type") is
replaced with positive assertions covering atoms, broadcast, vec/vec,
and the SSO inline/pooled boundary. Mirror coverage added to lt.rfl,
ge.rfl, le.rfl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ray_neg_fn only handled -RAY_I64 and -RAY_F64 atoms; (neg 5h) and
(neg 5i) raised "error: type" even though abs already accepted them.
Add the i16/i32 branches and preserve type — same convention as
binary +, -, *, %, /:
(neg 5h) -> -5h (i16)
(neg 5i) -> -5i (i32)
(neg [1h 2h]) -> [-1h -2h] (I16)
(Vector path inherits the fix via RAY_FN_ATOMIC.) Type preservation
is the right call here: in Rayforce typed nulls live in a separate
nullmap bit, so INT_MIN is just a regular value and there's no
overflow concern that would justify widening like abs does.
Add test/rfl/arith/type_preservation.rfl pinning the full type-
promotion matrix so a regression in any op surfaces loudly:
- same-width +/-/*//% preserve type (i16, i32, i64, f64)
- cross-width +/-/*//% follow "wider wins" (i16+i32→i32, etc.)
- all comparisons return bool
- unary neg/floor/ceil preserve type
- unary abs widens narrow ints to i64 (documented inconsistency)
- math fns (round/sqrt/log/exp) always return f64
Replaces the stale `(neg 5h) !- type` probes in test/rfl/arith/neg.rfl
with positive assertions including type assertions and null
propagation across i16/i32.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
abs was the lone outlier in the unary-numeric family — neg, floor,
ceil all preserve i16/i32, but abs widened them to i64. No good
reason for the inconsistency: in Rayforce typed nulls live in a
separate nullmap bit, so INT_MIN is just a regular value and the
"overflow protection" rationale that justifies widening in some
languages doesn't apply here.
Switch the i16/i32 branches to make_i16/make_i32:
(abs -5h) -> 5h (i16, was i64)
(abs -5i) -> 5i (i32, was i64)
(abs [-1h 2h]) -> [1h 2h] (I16, was I64)
Update test/rfl/arith/abs.rfl and the type_preservation.rfl regression
table: abs now joins neg/floor/ceil under the "preserve narrow ints"
rule.
Coverage of remaining type-promotion oddities (left intentional):
- sum widens I16/I32 → i64 (overflow guard)
- round of int → f64 (mathematically noop, but produces float anyway)
Both pinned in type_preservation.rfl so future changes surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nary
(modify t 'a neg) raised "error: type" while (map neg [1 2 3]) worked.
modify calls call_fn1(fn, column_vec) internally — call_fn1 saw a
RAY_UNARY function and invoked it directly on the whole vector. But
ray_neg_fn (and every other atomic unary kernel: abs, floor, ceil,
round, sqrt, log, exp, not) is written for a single atom and rejects
positive-typed (vector) input with "type".
call_fn2 already had the parallel routing for binary atomic builtins:
if ((fn->attrs & RAY_FN_ATOMIC) && (is_collection(a) || ...))
return atomic_map_binary(f, a, b);
call_fn1 simply forgot the unary mirror. Add it.
Affects every code path that uses call_fn1 — modify, fold (1-arg form),
the apply 1-arg case — so any future caller now gets vector
auto-mapping for free, matching the user's intuition that "if it
works in map, it should work everywhere".
Tests: extend test/rfl/table/modify.rfl with parity assertions
showing (modify t 'a neg) ≡ (modify t 'a (fn [x] (neg x))), plus
chained modify with abs/neg and floor/ceil over a float column.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document what file-mode and stdin-mode mean for error handling: - ray_repl_run_file (batch / script): rc=1 stops execution - ray_repl_run / stdin pipe (REPL): error printed, loop continues The probe spec/_probes/error_exit_code_inconsistent.rfl filed back when file-mode silently returned 0 on errors; that's been fixed since (every verb in the probe — asof-join, inner/left-join, filter, +, /, rand, til, alter — now correctly returns rc=1 from a script file). Drop the probe. The stdin-pipe-doesn't-abort behaviour is the REPL contract, not a bug; users wanting batch semantics should pass the script as a positional file arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three implementations existed and they disagreed on syntax:
- eval path: strop.c::str_glob — recursive, glob */?/[abc],
EXPONENTIAL on patterns like "a*a*a*…a*b" against "aaa…a"
(>5s on 16 stars, >timeout on 20).
- DAG path: string.c::like_match — iterative, but SQL %/_ syntax.
- DAG path: string.c::ilike_match — iterative SQL %/_ , case-
insensitive.
So `(like "hello" "h*")` returned true via eval but matched 0 rows
in `select where: (like s "h*")` — `*` was literal under the SQL
matcher. And the eval path could be DoS'd with a 20-star pattern.
Replace all three with one shared implementation in src/ops/glob.[ch]:
- Iterative two-pointer with last-star backtrack (glibc fnmatch
style). O(n*m) worst case; 32-star pattern that pre-fix took
>5s now finishes in microseconds.
- Glob syntax matching the documented contract: * (any), ? (one),
[abc] / [a-z] / [!abc] (character class).
- ray_glob_match (case-sensitive) and ray_glob_match_ci (folds
ASCII letters on both sides).
eval path (strop.c::ray_like_fn) and DAG path (string.c::exec_like /
exec_ilike) both call the same matcher — semantics now identical.
Tests:
* test/rfl/strop/like.rfl extended with character classes, ranges,
negated classes, an adversarial 20-star catastrophic-backtracking
guard, and explicit eval-path-≡-DAG-path parity assertions.
* test/test_exec.c: stale C-test using "bar%" (SQL syntax) updated
to "bar*" (glob).
Docs: website/docs/rayfall-functions.html updated to mention [abc] /
[a-z] / [!abc] (already implemented; previously undocumented).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t-parted
(set-splayed "/tmp/db/2024.01.01/t/" t) -> error: io
(get-parted "/tmp/db/" 't) -> error: io
Two compounding bugs broke partitioned-table workflows:
1. ray_mkdir was single-level — could not create
"/tmp/db/2024.01.01/t/" when the parent directories didn't pre-
exist. v1 quietly handled this inside fs_fopen (walk path,
mkdir each parent before opening the file); v2 lost that.
Add ray_mkdir_p (POSIX + Win32, src/store/fileio.c) with
mkdir -p semantics. Use it from ray_splay_save instead of the
single-level ray_mkdir.
2. ray_read_parted (get-parted) unconditionally called
ray_sym_load("<db_root>/sym") and propagated its failure as
"io" — but set-splayed only writes per-table sym files inside
the leaf splayed dir, never a root-level one for symbol-less
tables. Stat the file and skip the load if absent.
After both: (set-splayed "/tmp/db/2024.01.01/t/" t0) writes the
partition correctly; (get-parted "/tmp/db/" 't) returns a 10-row /
3-column lazy table (1 MAPCOMMON partition-key + 2 data columns),
parity with the v1 behaviour.
Tests: extend test/rfl/system/splayed.rfl with the nested-mkdir case
and a 2-partition get-parted round-trip (count = 10, columns = 3).
Known limitation (separate task): direct (at p 'col) on a PARTED
column doesn't materialize values yet — only the select-where path
exercises the lazy reader. Filed for follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds: .idea/, .vscode/ — IDE working state *.gcda, *.gcno, *.gcov — gcc coverage instrumentation outputs coverage*.info — lcov tracefiles coverage_html/ — genhtml output directory rayforce.cov — clang/llvm coverage runtime output Keeps the working tree clean during a coverage build so `git status` isn't drowned in untracked binary artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three legacy script-style test files (test_4097.rfl, test_null_ops.rfl,
test_comprehensive.rfl) sat untracked in the repo root. Reviewed each
against the existing test/rfl/ corpus; salvaged what wasn't covered:
* test/rfl/integration/sort_radix_boundary.rfl (new) — pin every
type's sort exit values at N=4097, just over the 2^12 radix-strategy
threshold. Covers asc/desc/iasc/idesc/xasc/xdesc/select-orderby/
group-by-then-sort/distinct/rank across i64, f64, SYM, STR, BOOL,
DATE, TIMESTAMP, plus null-bearing variants.
* test/rfl/null/sort.rfl (extended) — add xasc-on-null-keyed-table
(null sorts first, count preserved) and take-with-nulls slicing
semantics. Anton's existing file covered asc/desc/iasc/idesc with
nulls but not xasc or take.
The other two files duplicated Anton's null/* and arith/div coverage
(Float div-by-zero already produces 0Nf in his test/rfl/arith/div.rfl,
INT64 boundary already in test/rfl/integration/null.rfl), so dropped.
Plus removed:
rayforce.cov — clang/llvm runtime artifact
test/bugs/ — 6 of 7 already migrated as upstream
tests; legacy duplicates
extract_v1_tests.py + v1_tests*.rfl — Python interim, generation
output; we don't ship Python in the
test pipeline
test_null_full / 100k / parallel / bugs.rfl — used renamed verbs
(write-csv, antijoin) that don't
exist in current Rayforce, and what
they tested is already covered
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…av + radix groupby Four register_vary builtins had 0% coverage despite existing in src/lang/eval.c: - collection.c::ray_fold_right_fn (0% → 58%, +21 lines) - collection.c::ray_scan_right_fn (0% → 73%, +24 lines) - datalog.c::ray_retract_fact_fn (0% → 88%, +46 lines) - datalog.c::ray_scan_eav_fn (0% → 89%, +48 lines) Net +139 source lines newly covered from 3 small tests, no kernel changes. * test/rfl/hof/right.rfl — right-fold semantics for + - and a digit- building lambda f(a,b)=a+10b that distinguishes left vs right fold shape; suffix-sum invariants for scan-right. * test/rfl/datalog/eav_ops.rfl — assert/retract round-trips, retract no-op on missing triple, retract leaves sibling attributes intact; scan-eav 2-arg (filter by attr) and 3-arg (entity+attr lookup). * test/rfl/integration/radix_groupby.rfl — 100k and 200k row group-by to push the executor past RAY_PARALLEL_THRESHOLD (= 64*1024) and exercise the parallel radix path in group.c. Also pins multi-key group-by + multi-aggregator results at 100k. Overall lines coverage: 64.2% → 65.0% (+0.8pp), functions 78.5% → 79.0%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ray_alter_fn (40% → 83%, +66 lines), ray_reverse_fn (40% → 62%, +11), ray_union_fn / ray_except_fn — the previous tests only hit i64 (and SYM for except) and never the boxed-list path. Each function has distinct branches per element type and a separate heterogeneous-list branch that vec-of-X tests skip entirely. * reverse — exercise type switch across F64, I16, I32, U8, BOOL, SYM, STR, DATE, TIME, plus null-bearing input. * union/except — vec switch for f64/i16/i32/STR/DATE/TIME/BOOL plus the (list ...) variants which route through ray_union_fn's boxed-list fallback at line 793. * alter — add LIST-only forms (set/concat/remove with atom or vec index), plus alter-set across F64/I16/SYM/BOOL vec types to exercise store_typed_elem dispatch. No kernel changes; tests only. +1 test file count not changed (all edits are in-place to existing files). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ray_pivot_fn previously exercised only sum/count over one row-key on a 3-column SYM/SYM/I64 fixture. Add: * avg / min / max aggregator hits (separate combine paths for each) * multi-row-key form (pivot t [r] c v sum) — vector first arg * f64 value column ray_union_all_fn was registered but had no .rfl tests at all (0% coverage despite being exposed as `union-all`). Add row-count, sum invariants, and empty-table edge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
expr.c::expr_exec_binary handles all arithmetic/comparison ops in DAG (select) context. Top-level (+ a b) goes through the eval-path kernel, not this DAG branch, so the existing arith.rfl tests miss it entirely. The new file pins: - int + - * / % across various scalar/vec broadcast patterns - float arithmetic - narrow-int (I16/I32) DAG behaviour - all six comparison ops returning bool vector - compound where: (and/or) in select - filter+arithmetic fusion (where + derived col) - F64 NaN-aware DAG comparison DOCUMENTED INCONSISTENCY found during this session: the DAG path widens narrow-int arithmetic (I16+I16 → I64) while the eval-path kernel preserves type ((+ 5h 3h) → 8h, I16). The new test pins the DAG's current widening behaviour so a future alignment fix surfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture
A magisterial integration test that builds one 200-row table with 11
columns spanning every primitive type (I64, I16, I32, F64, SYM, STR,
B8, DATE, TIME, plus derived F64 price and signed I64 qty) and runs:
* 12 atomic aggregations (sum/count/avg/min/max/first/last) per column
* 7 comparison-operator selects across SYM/F64/I64/B8 columns
* 7 sort variants (asc/desc/iasc/idesc/rank/xasc/xdesc)
* distinct/take/reverse/concat/in across multiple types
* group-by per key type (SYM, STR, B8, DATE, I16) plus 2 multi-key
* DAG-derived columns via select+arithmetic
* inner/left/anti joins with partial-coverage lookup tables
* pivot with sum / count / avg aggregators
* cast across i16/i32/i64/f64/b8
* update / modify / insert / upsert
* csv round-trip (.csv.write / .csv.read)
* splayed round-trip (set-splayed / get-splayed)
ΔLine coverage from this single file: +49 lines.
KNOWN BUGS uncovered while writing this test, pinned with comments:
- (first dt-col) drops DATE type → i64 (returns days-since-epoch)
- (first tm-col) drops TIME type → i64 (ms since midnight)
- (last bool-col) drops BOOL type → 0/1 i64
- set-splayed of a table with a SYM column then get-splayed yields
"error: corrupt"; same fixture sans SYM round-trips cleanly
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ray_first_fn and ray_last_fn whitelisted SYM/I16/I32/GUID/STR for the
type-preserving collection_elem path; everything else fell through to
AGG_VEC_VIA_DAG which produced an i64 result for these types:
(first [2024.01.01 2024.01.02]) → 8766 (was: 'date 2024.01.01)
(first [09:30:00.000 ...]) → 34200000 (was: 'time 09:30:00.000)
(last [true false]) → 0 (was: 'b8 false)
Add DATE / TIME / TIMESTAMP / BOOL / U8 to the whitelist so they
follow the same type-preserving path. collection_elem already builds
typed atoms for all of them via ray_date / ray_time / ray_timestamp /
ray_bool / ray_u8 — this fix simply routes there.
Discovered while writing test/rfl/integration/cross_type_workout.rfl;
that file's pinned-bug TODOs are now upgraded to type-asserting
assertions. Plus per-type regression coverage in
test/rfl/agg/{first,last}.rfl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A 1000-row two-column fixture with 50 distinct keys driving group.c::
exec_group through every aggregator combination + every primitive key
type.
Aggregators covered in group-by context: count, sum, avg, min, max,
first, last, dev, var, stddev, dev_pop, var_pop, stddev_pop. Plus
multi-aggregator (7-agg) select, group-by + filter pushdown, and
no-by-clause aggregation over the whole table.
Key types: I64, SYM, BOOL, DATE, F64, I16, I32 — each routes through
a separate hash-key path in the parallel radix groupby.
KNOWN BUGS pinned with `!- length`:
- (med v) inside (select … by:) raises "length: non-agg expression
referencing a column produced a non-row-aligned result"
- (diverse v) — actually a bool predicate ("all unique?"), not the
count-distinct one might guess from the name; works standalone but
isn't usable inside group-by either
ΔLine coverage: +67 lines (mostly variant aggregator paths in group.c
and per-key-type dispatch in exec_group).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends pivot.rfl with paths previously uncovered: * multi-row-key form (pivot t [a b] c v sum) — 4-column output * I64 row key (was only SYM-keyed) * DATE row key * missing-cell semantics: sum and count both yield 0 for empty groups These exercise the row-key hash + cross-product paths in tblop.c::ray_pivot_fn that the existing fixtures (one SYM row key) didn't reach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends datalog/rule.rfl with paths missed by the original 13 cases: * recursive (anc ?x ?y) — depth-3 chain plus a disjoint pair (5→6), exercising the rule fixed-point loop in dl_compile_rule * two-clause derivation (cofriend) — duplicate triggers de-dupe * multi-constant body (dept 10 ∧ level 'senior) — separate filter branches in dl_parse_body_clause Should hit some of the 247 uncovered lines in datalog.c::dl_compile_rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code reviewStrengths
Issues to address before merge
int r = (sa && sb) ? ray_str_cmp(sa, sb) : 0;If one
if (x->type == -RAY_I32) return make_i32(-x->i32);
if (x->type == -RAY_I16) return make_i16(-x->i16);
Worth flagging (not blockers)
Convention checks
VerdictApprove once the three blockers above (sym NULL fallback, INT_MIN UB, |
Address Anton's PR RayforceDB#8 review: [blocker] sym_atom_cmp: NULL ray_sym_str → fall back to comparing raw interned ids instead of returning 0 (which would silently collapse distinct symbols if the intern table is corrupted or uninitialised). Stable, total ordering preserved. [blocker] neg/abs INT_MIN UB: -INT16_MIN / -INT32_MIN / -INT64_MIN were signed-overflow UB. Negate via unsigned cast — wraparound is defined for unsigned types and the result wraps back to INT_MIN consistently with binary `(- 0 INT_MIN)`. Added regression rows in arith/neg.rfl and arith/abs.rfl that pin INT16_MIN / INT32_MIN behaviour (via cast since `-32768h` literal is unrepresentable). [blocker] ray_mkdir_p path buffer: replace hardcoded `char buf[1024]` with `RAY_PATH_MAX` (PATH_MAX on POSIX, 4096 on Windows). Deep splayed paths like /db/yyyy.mm.dd/leaf/ now fit. [bonus] glob `%`/`_` literal-match assertions in strop/like.rfl — makes the SQL→glob consolidation explicit in tests, not just docs. [bonus] glob.h documents the lenient unterminated-class policy (matches glibc fnmatch semantics; never produces parse error). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the review. All three blockers addressed in 1. 2. 3. Bonus:
The Tests still 974 of 975 passing under ASan + UBSan. |
Previous implementation used unsigned-wrap negation, so:
(abs -32768h) -> -32768h ❌ negative result from abs!
(abs (- (neg 32767h) 1h)) -> -32768h ❌ same path
(neg INT_MIN) -> INT_MIN ❌ wrap, silent
Math: -INT_MIN doesn't fit in the same signed width. Per k/q
convention, surface the overflow as a typed null of the same width:
(neg INT16_MIN) -> 0Nh
(neg INT32_MIN) -> 0Ni
(neg INT64_MIN) -> 0Nl
(abs INT_MIN) -> typed null (same per width)
Properties:
✓ Type preserved (no widening to i64)
✓ No UB (overflow detected before negation)
✓ Detectable via `nil?`
✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni
The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare
in real data; modern branch predictors will treat it as a
near-zero-cost test on the hot path.
DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate
hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived-
columns). Inconsistency between atom and DAG paths to be addressed
in a follow-up.
Tests in test/rfl/arith/{neg,abs}.rfl now pin:
- (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type
- adjacent values (INT_MAX, -INT_MAX) still negate normally
Addresses Anton's PR RayforceDB#8 review blocker 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 dropped `register_binary_op("and", ..., OP_AND)` and switched `and`/`or` to eval-time variadic via `register_vary`. `compile_expr_dag` still recognised the 2-arg form via `resolve_binary_dag` (binary shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause caller upgraded that to "WHERE predicate not supported by DAG compiler". Anton flagged this as a silent perf cliff; in practice it is a correctness regression (queries that worked in v1 / pre-PR fail outright). Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes before lowering: `(and a b c d)` → `(and (and a b) (and c d))`, depth log2(N). The fused-expr executor evaluates this as a flat sequence of binary AND/OR instructions sharing scratch registers, so no extra column allocations vs the hand-nested form. Iterative pairwise reduction inside `compile_expr_dag`; bail at >64 inputs (avoids unbounded stack array, no realistic query needs more). Balanced shape (rather than left-fold) keeps the tree symmetric and minimises dependency-chain depth, leaving room for a future predicate-parallel executor without rewriting this pass. Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`; fold semantics agrees with composing two filtered selects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`, losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried. Result: every arg is eagerly evaluated even when the result is already determined by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`, `(or true (slow-query))` runs `slow-query`. v1 short-circuited; v2 silently regressed. Promote both to special form so the kernel receives unevaluated AST nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`, `let`, `select` etc. Short-circuit triggers only when the running accumulator is a *scalar* with the determining truth value: scalar falsy for AND, scalar truthy for OR. When the accumulator is a vector we fall through to `ray_and_fn` / `ray_or_fn` for proper element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape semantics that the existing test asserts). Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents evaluation of subsequent args (probe via `undefined-name`), and short-circuit does *not* trigger when the result is undetermined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous implementation used unsigned-wrap negation, so:
(abs -32768h) -> -32768h ❌ negative result from abs!
(abs (- (neg 32767h) 1h)) -> -32768h ❌ same path
(neg INT_MIN) -> INT_MIN ❌ wrap, silent
Math: -INT_MIN doesn't fit in the same signed width. Per k/q
convention, surface the overflow as a typed null of the same width:
(neg INT16_MIN) -> 0Nh
(neg INT32_MIN) -> 0Ni
(neg INT64_MIN) -> 0Nl
(abs INT_MIN) -> typed null (same per width)
Properties:
✓ Type preserved (no widening to i64)
✓ No UB (overflow detected before negation)
✓ Detectable via `nil?`
✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni
The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare
in real data; modern branch predictors will treat it as a
near-zero-cost test on the hot path.
DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate
hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived-
columns). Inconsistency between atom and DAG paths to be addressed
in a follow-up.
Tests in test/rfl/arith/{neg,abs}.rfl now pin:
- (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type
- adjacent values (INT_MAX, -INT_MAX) still negate normally
Addresses Anton's PR #8 review blocker 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #8 dropped `register_binary_op("and", ..., OP_AND)` and switched `and`/`or` to eval-time variadic via `register_vary`. `compile_expr_dag` still recognised the 2-arg form via `resolve_binary_dag` (binary shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause caller upgraded that to "WHERE predicate not supported by DAG compiler". Anton flagged this as a silent perf cliff; in practice it is a correctness regression (queries that worked in v1 / pre-PR fail outright). Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes before lowering: `(and a b c d)` → `(and (and a b) (and c d))`, depth log2(N). The fused-expr executor evaluates this as a flat sequence of binary AND/OR instructions sharing scratch registers, so no extra column allocations vs the hand-nested form. Iterative pairwise reduction inside `compile_expr_dag`; bail at >64 inputs (avoids unbounded stack array, no realistic query needs more). Balanced shape (rather than left-fold) keeps the tree symmetric and minimises dependency-chain depth, leaving room for a future predicate-parallel executor without rewriting this pass. Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`; fold semantics agrees with composing two filtered selects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`, losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried. Result: every arg is eagerly evaluated even when the result is already determined by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`, `(or true (slow-query))` runs `slow-query`. v1 short-circuited; v2 silently regressed. Promote both to special form so the kernel receives unevaluated AST nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`, `let`, `select` etc. Short-circuit triggers only when the running accumulator is a *scalar* with the determining truth value: scalar falsy for AND, scalar truthy for OR. When the accumulator is a vector we fall through to `ray_and_fn` / `ray_or_fn` for proper element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape semantics that the existing test asserts). Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents evaluation of subsequent args (probe via `undefined-name`), and short-circuit does *not* trigger when the result is undetermined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs
Summary
Closes 8 known bugs (the 7 from spec/_probes/ plus one uncovered while writing the cross-type workout test) and lifts overall coverage from 64.2% → 65.5% lines (+542 lines). Each fix is a self-contained commit with regression tests in test/rfl/.
Bug fixes
153e9e9(+ 1 2 3)silently dropped third arg072e040(> 'b 'a)raised type instead of comparing SYM9ff6e80(neg 5h)raised type for narrow intsea281ef(abs 5h)widened to i64 unlike neg3df78bb(modify t 'a neg)rejected builtin fnmapab2074c(like ...)catastrophic backtracking + dual semanticsd20284c(set-splayed)failed on nested paths;get-partederrored3872960first/lastwidened DATE/TIME/TIMESTAMP/BOOL to i64Plus
ac0e7cadocuments the file-mode error-exit contract (the corresponding bug was already fixed upstream).Coverage uplift (test-only)
38e7932— consolidate non-duplicate coverage from spec migration89f37a9— radix-boundary + null sort coverage salvaged from scratch files6a7ffa4— fold-right/scan-right/retract-fact/scan-eav (+139 lines: each was 0%)6f32db0— per-type and list-form coverage for reverse/union/except/alterc4961bb— pivot avg/min/max + multi-key + f64; new union-all coveragee368634— DAG executor binary ops via select-with-derived-cols6e0da6f— single 200-row × 11-column workout fixture exercising aggregation/sort/filter/group-by/join/pivot/cast/csv/splayed across all primitive types52cf5df— .gitignore for IDE/coverage artifactsTest plan
make test— 973 of 974 passed (1 explicit skip, 0 failed) on Linux x86_64 with ASan + UBSantest/rfl/likecatastrophic-backtracking guard: 20-star pattern finishes in <1ms (was timeout)set-splayednested path +get-partedround-tripfirst/lastpreserve type for DATE/TIME/TIMESTAMP/BOOL/U8Known follow-ups (not in this PR)
(at p 'col)on a PARTED column doesn't materialize — only the select-where path exercises the lazy reader (commitd20284c)ray_mkdir_pmirrors POSIX logic but I have no Windows CI to test ittest/rfl/integration/dag_binary_ops.rflsumon I16/I32 widens to i64 — pinned as documented inconsistency intest/rfl/arith/type_preservation.rfl🤖 Generated with Claude Code