fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout by ser-vasilich · Pull Request #8 · RayforceDB/rayforce2

ser-vasilich · 2026-04-27T18:33:26Z

Summary

Closes 8 known bugs (the 7 from spec/_probes/ plus one uncovered while writing the cross-type workout test) and lifts overall coverage from 64.2% → 65.5% lines (+542 lines). Each fix is a self-contained commit with regression tests in test/rfl/.

Bug fixes

Commit	Bug	Severity
`153e9e9`	`(+ 1 2 3)` silently dropped third arg	correctness — silent data loss
`072e040`	`(> 'b 'a)` raised type instead of comparing SYM	feature gap
`9ff6e80`	`(neg 5h)` raised type for narrow ints	feature gap
`ea281ef`	`(abs 5h)` widened to i64 unlike neg	inconsistency
`3df78bb`	`(modify t 'a neg)` rejected builtin fn	inconsistency vs `map`
`ab2074c`	`(like ...)` catastrophic backtracking + dual semantics	DoS vector + correctness
`d20284c`	`(set-splayed)` failed on nested paths; `get-parted` errored	usability
`3872960`	`first`/`last` widened DATE/TIME/TIMESTAMP/BOOL to i64	type preservation

Plus ac0e7ca documents the file-mode error-exit contract (the corresponding bug was already fixed upstream).

Coverage uplift (test-only)

38e7932 — consolidate non-duplicate coverage from spec migration
89f37a9 — radix-boundary + null sort coverage salvaged from scratch files
6a7ffa4 — fold-right/scan-right/retract-fact/scan-eav (+139 lines: each was 0%)
6f32db0 — per-type and list-form coverage for reverse/union/except/alter
c4961bb — pivot avg/min/max + multi-key + f64; new union-all coverage
e368634 — DAG executor binary ops via select-with-derived-cols
6e0da6f — single 200-row × 11-column workout fixture exercising aggregation/sort/filter/group-by/join/pivot/cast/csv/splayed across all primitive types
52cf5df — .gitignore for IDE/coverage artifacts

Test plan

make test — 973 of 974 passed (1 explicit skip, 0 failed) on Linux x86_64 with ASan + UBSan
Each fix has at least one regression test under test/rfl/
like catastrophic-backtracking guard: 20-star pattern finishes in <1ms (was timeout)
set-splayed nested path + get-parted round-trip
first/last preserve type for DATE/TIME/TIMESTAMP/BOOL/U8

Known follow-ups (not in this PR)

(at p 'col) on a PARTED column doesn't materialize — only the select-where path exercises the lazy reader (commit d20284c)
Win32 ray_mkdir_p mirrors POSIX logic but I have no Windows CI to test it
DAG narrow-int arithmetic widens (I16+I16 → I64) while eval-path preserves type — pinned in test/rfl/integration/dag_binary_ops.rfl
sum on I16/I32 widens to i64 — pinned as documented inconsistency in test/rfl/arith/type_preservation.rfl

🤖 Generated with Claude Code

Adds 624 unique test assertions on top of Anton's test/rfl/ corpus, deduped against his parallel rewrite (commit 536fef4 et al). Stats: * 35 modified files — our backup was a strict superset of upstream; replace upstream wholesale. Whitespace-only diffs in arith/abs, arith/ceil, arith/floor, arith/neg, arith/round, cmp/ge, cmp/gt, cmp/ne were also false-positive "Anton-only" results from sorting with extra spaces. * 12 new files — coverage Anton didn't write at all: test/rfl/datalog/rule.rfl — full EAV / rule / TC / negation test/rfl/embedding/hnsw.rfl — HNSW index build/query/info test/rfl/integration/arena.rfl — 100k-element churn + .sys.gc test/rfl/integration/cow.rfl — copy-on-write aliasing test/rfl/integration/morsel.rfl — 1023/1024/1025 + 2047/2048/2049 test/rfl/integration/optimizer.rfl — filter reorder, pushdown, selection-bitmap edges test/rfl/integration/str_pool.rfl — 12-byte SSO boundary test/rfl/system/csv_roundtrip.rfl — schema-less .csv.read test/rfl/system/splayed.rfl — set/get-splayed round-trip test/rfl/table/modify.rfl — functional column update test/rfl/table/pivot.rfl — wide reshape, sum/count aggr test/rfl/table/select.rfl — 50 select-clause assertions Coverage delta vs origin/master: Tests: 921/922 → 933/934 (+12 unique tests) Lines: 63.4% → 63.9% (+217 lines) Functions: 77.5% → 78.0% (+8 functions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously +/-/* and other RAY_BINARY verbs silently truncated extras: (+ 1 2 3) -> 3 (third arg dropped) (+ 1 2 3 4 5) -> 3 (- 10 1 2) -> 9 (* 2 3 4) -> 6 Same hole in RAY_UNARY: extras after the first argument were released without raising. `(+ 1)` and `(+)` were already domain errors but the 3+-arg case fell through to the binary kernel against the first two args and threw the rest away. Now both dispatch sites — the bytecode VM (eval.c:1656) and the tree- walking eval (eval.c:2378) — raise: arity: expected N arg(s), got M `and` and `or` were registered as RAY_BINARY but Anton's tests expected variadic fold semantics ((and a b c) -> fold AND). Switch them to RAY_VARY via ray_and_vary_fn / ray_or_vary_fn — left-fold over the existing binary kernel. The (and X Y) DAG path through select where: clauses still emits OP_AND / OP_OR via the expression compiler, so fused execution is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously (> 'b 'a) raised "error: type: cannot compare sym and sym" even for the reflexive case (>= 'a 'a). Equality comparisons (==/!=) worked because they could just check interned IDs, but ordering had no implementation at all and fell through to the is_numeric() guard. Add sym_atom_cmp(a, b): - Fast path: equal interned IDs => identical text (interning gives one ID per text), return 0 without touching the global sym table. - Slow path: ray_sym_str(id) returns a RAY_STR atom; ray_str_cmp delegates to ray_str_t_cmp which uses the 12-byte SSO inline path for short symbols and prefix-then-fullcompare for pooled ones. Wire it into ray_gt_fn / ray_lt_fn / ray_gte_fn / ray_lte_fn at the same dispatch site as the existing -RAY_GUID branch. Vector and broadcast paths inherit the fix automatically — atomic_map_binary unboxes SYM elements via collection_elem (-> ray_sym(id) atom), so each pairwise call lands in the new branch. The pre-fix probe (test/rfl/cmp/gt.rfl:41-44, "(> 'b 'a) !- type") is replaced with positive assertions covering atoms, broadcast, vec/vec, and the SSO inline/pooled boundary. Mirror coverage added to lt.rfl, ge.rfl, le.rfl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ray_neg_fn only handled -RAY_I64 and -RAY_F64 atoms; (neg 5h) and (neg 5i) raised "error: type" even though abs already accepted them. Add the i16/i32 branches and preserve type — same convention as binary +, -, *, %, /: (neg 5h) -> -5h (i16) (neg 5i) -> -5i (i32) (neg [1h 2h]) -> [-1h -2h] (I16) (Vector path inherits the fix via RAY_FN_ATOMIC.) Type preservation is the right call here: in Rayforce typed nulls live in a separate nullmap bit, so INT_MIN is just a regular value and there's no overflow concern that would justify widening like abs does. Add test/rfl/arith/type_preservation.rfl pinning the full type- promotion matrix so a regression in any op surfaces loudly: - same-width +/-/*//% preserve type (i16, i32, i64, f64) - cross-width +/-/*//% follow "wider wins" (i16+i32→i32, etc.) - all comparisons return bool - unary neg/floor/ceil preserve type - unary abs widens narrow ints to i64 (documented inconsistency) - math fns (round/sqrt/log/exp) always return f64 Replaces the stale `(neg 5h) !- type` probes in test/rfl/arith/neg.rfl with positive assertions including type assertions and null propagation across i16/i32. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

abs was the lone outlier in the unary-numeric family — neg, floor, ceil all preserve i16/i32, but abs widened them to i64. No good reason for the inconsistency: in Rayforce typed nulls live in a separate nullmap bit, so INT_MIN is just a regular value and the "overflow protection" rationale that justifies widening in some languages doesn't apply here. Switch the i16/i32 branches to make_i16/make_i32: (abs -5h) -> 5h (i16, was i64) (abs -5i) -> 5i (i32, was i64) (abs [-1h 2h]) -> [1h 2h] (I16, was I64) Update test/rfl/arith/abs.rfl and the type_preservation.rfl regression table: abs now joins neg/floor/ceil under the "preserve narrow ints" rule. Coverage of remaining type-promotion oddities (left intentional): - sum widens I16/I32 → i64 (overflow guard) - round of int → f64 (mathematically noop, but produces float anyway) Both pinned in type_preservation.rfl so future changes surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nary (modify t 'a neg) raised "error: type" while (map neg [1 2 3]) worked. modify calls call_fn1(fn, column_vec) internally — call_fn1 saw a RAY_UNARY function and invoked it directly on the whole vector. But ray_neg_fn (and every other atomic unary kernel: abs, floor, ceil, round, sqrt, log, exp, not) is written for a single atom and rejects positive-typed (vector) input with "type". call_fn2 already had the parallel routing for binary atomic builtins: if ((fn->attrs & RAY_FN_ATOMIC) && (is_collection(a) || ...)) return atomic_map_binary(f, a, b); call_fn1 simply forgot the unary mirror. Add it. Affects every code path that uses call_fn1 — modify, fold (1-arg form), the apply 1-arg case — so any future caller now gets vector auto-mapping for free, matching the user's intuition that "if it works in map, it should work everywhere". Tests: extend test/rfl/table/modify.rfl with parity assertions showing (modify t 'a neg) ≡ (modify t 'a (fn [x] (neg x))), plus chained modify with abs/neg and floor/ceil over a float column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document what file-mode and stdin-mode mean for error handling: - ray_repl_run_file (batch / script): rc=1 stops execution - ray_repl_run / stdin pipe (REPL): error printed, loop continues The probe spec/_probes/error_exit_code_inconsistent.rfl filed back when file-mode silently returned 0 on errors; that's been fixed since (every verb in the probe — asof-join, inner/left-join, filter, +, /, rand, til, alter — now correctly returns rc=1 from a script file). Drop the probe. The stdin-pipe-doesn't-abort behaviour is the REPL contract, not a bug; users wanting batch semantics should pass the script as a positional file arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three implementations existed and they disagreed on syntax: - eval path: strop.c::str_glob — recursive, glob */?/[abc], EXPONENTIAL on patterns like "a*a*a*…a*b" against "aaa…a" (>5s on 16 stars, >timeout on 20). - DAG path: string.c::like_match — iterative, but SQL %/_ syntax. - DAG path: string.c::ilike_match — iterative SQL %/_ , case- insensitive. So `(like "hello" "h*")` returned true via eval but matched 0 rows in `select where: (like s "h*")` — `*` was literal under the SQL matcher. And the eval path could be DoS'd with a 20-star pattern. Replace all three with one shared implementation in src/ops/glob.[ch]: - Iterative two-pointer with last-star backtrack (glibc fnmatch style). O(n*m) worst case; 32-star pattern that pre-fix took >5s now finishes in microseconds. - Glob syntax matching the documented contract: * (any), ? (one), [abc] / [a-z] / [!abc] (character class). - ray_glob_match (case-sensitive) and ray_glob_match_ci (folds ASCII letters on both sides). eval path (strop.c::ray_like_fn) and DAG path (string.c::exec_like / exec_ilike) both call the same matcher — semantics now identical. Tests: * test/rfl/strop/like.rfl extended with character classes, ranges, negated classes, an adversarial 20-star catastrophic-backtracking guard, and explicit eval-path-≡-DAG-path parity assertions. * test/test_exec.c: stale C-test using "bar%" (SQL syntax) updated to "bar*" (glob). Docs: website/docs/rayfall-functions.html updated to mention [abc] / [a-z] / [!abc] (already implemented; previously undocumented). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t-parted (set-splayed "/tmp/db/2024.01.01/t/" t) -> error: io (get-parted "/tmp/db/" 't) -> error: io Two compounding bugs broke partitioned-table workflows: 1. ray_mkdir was single-level — could not create "/tmp/db/2024.01.01/t/" when the parent directories didn't pre- exist. v1 quietly handled this inside fs_fopen (walk path, mkdir each parent before opening the file); v2 lost that. Add ray_mkdir_p (POSIX + Win32, src/store/fileio.c) with mkdir -p semantics. Use it from ray_splay_save instead of the single-level ray_mkdir. 2. ray_read_parted (get-parted) unconditionally called ray_sym_load("<db_root>/sym") and propagated its failure as "io" — but set-splayed only writes per-table sym files inside the leaf splayed dir, never a root-level one for symbol-less tables. Stat the file and skip the load if absent. After both: (set-splayed "/tmp/db/2024.01.01/t/" t0) writes the partition correctly; (get-parted "/tmp/db/" 't) returns a 10-row / 3-column lazy table (1 MAPCOMMON partition-key + 2 data columns), parity with the v1 behaviour. Tests: extend test/rfl/system/splayed.rfl with the nested-mkdir case and a 2-partition get-parted round-trip (count = 10, columns = 3). Known limitation (separate task): direct (at p 'col) on a PARTED column doesn't materialize values yet — only the select-where path exercises the lazy reader. Filed for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds: .idea/, .vscode/ — IDE working state *.gcda, *.gcno, *.gcov — gcc coverage instrumentation outputs coverage*.info — lcov tracefiles coverage_html/ — genhtml output directory rayforce.cov — clang/llvm coverage runtime output Keeps the working tree clean during a coverage build so `git status` isn't drowned in untracked binary artifacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three legacy script-style test files (test_4097.rfl, test_null_ops.rfl, test_comprehensive.rfl) sat untracked in the repo root. Reviewed each against the existing test/rfl/ corpus; salvaged what wasn't covered: * test/rfl/integration/sort_radix_boundary.rfl (new) — pin every type's sort exit values at N=4097, just over the 2^12 radix-strategy threshold. Covers asc/desc/iasc/idesc/xasc/xdesc/select-orderby/ group-by-then-sort/distinct/rank across i64, f64, SYM, STR, BOOL, DATE, TIMESTAMP, plus null-bearing variants. * test/rfl/null/sort.rfl (extended) — add xasc-on-null-keyed-table (null sorts first, count preserved) and take-with-nulls slicing semantics. Anton's existing file covered asc/desc/iasc/idesc with nulls but not xasc or take. The other two files duplicated Anton's null/* and arith/div coverage (Float div-by-zero already produces 0Nf in his test/rfl/arith/div.rfl, INT64 boundary already in test/rfl/integration/null.rfl), so dropped. Plus removed: rayforce.cov — clang/llvm runtime artifact test/bugs/ — 6 of 7 already migrated as upstream tests; legacy duplicates extract_v1_tests.py + v1_tests*.rfl — Python interim, generation output; we don't ship Python in the test pipeline test_null_full / 100k / parallel / bugs.rfl — used renamed verbs (write-csv, antijoin) that don't exist in current Rayforce, and what they tested is already covered Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…av + radix groupby Four register_vary builtins had 0% coverage despite existing in src/lang/eval.c: - collection.c::ray_fold_right_fn (0% → 58%, +21 lines) - collection.c::ray_scan_right_fn (0% → 73%, +24 lines) - datalog.c::ray_retract_fact_fn (0% → 88%, +46 lines) - datalog.c::ray_scan_eav_fn (0% → 89%, +48 lines) Net +139 source lines newly covered from 3 small tests, no kernel changes. * test/rfl/hof/right.rfl — right-fold semantics for + - and a digit- building lambda f(a,b)=a+10b that distinguishes left vs right fold shape; suffix-sum invariants for scan-right. * test/rfl/datalog/eav_ops.rfl — assert/retract round-trips, retract no-op on missing triple, retract leaves sibling attributes intact; scan-eav 2-arg (filter by attr) and 3-arg (entity+attr lookup). * test/rfl/integration/radix_groupby.rfl — 100k and 200k row group-by to push the executor past RAY_PARALLEL_THRESHOLD (= 64*1024) and exercise the parallel radix path in group.c. Also pins multi-key group-by + multi-aggregator results at 100k. Overall lines coverage: 64.2% → 65.0% (+0.8pp), functions 78.5% → 79.0%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ray_alter_fn (40% → 83%, +66 lines), ray_reverse_fn (40% → 62%, +11), ray_union_fn / ray_except_fn — the previous tests only hit i64 (and SYM for except) and never the boxed-list path. Each function has distinct branches per element type and a separate heterogeneous-list branch that vec-of-X tests skip entirely. * reverse — exercise type switch across F64, I16, I32, U8, BOOL, SYM, STR, DATE, TIME, plus null-bearing input. * union/except — vec switch for f64/i16/i32/STR/DATE/TIME/BOOL plus the (list ...) variants which route through ray_union_fn's boxed-list fallback at line 793. * alter — add LIST-only forms (set/concat/remove with atom or vec index), plus alter-set across F64/I16/SYM/BOOL vec types to exercise store_typed_elem dispatch. No kernel changes; tests only. +1 test file count not changed (all edits are in-place to existing files). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ray_pivot_fn previously exercised only sum/count over one row-key on a 3-column SYM/SYM/I64 fixture. Add: * avg / min / max aggregator hits (separate combine paths for each) * multi-row-key form (pivot t [r] c v sum) — vector first arg * f64 value column ray_union_all_fn was registered but had no .rfl tests at all (0% coverage despite being exposed as `union-all`). Add row-count, sum invariants, and empty-table edge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

expr.c::expr_exec_binary handles all arithmetic/comparison ops in DAG (select) context. Top-level (+ a b) goes through the eval-path kernel, not this DAG branch, so the existing arith.rfl tests miss it entirely. The new file pins: - int + - * / % across various scalar/vec broadcast patterns - float arithmetic - narrow-int (I16/I32) DAG behaviour - all six comparison ops returning bool vector - compound where: (and/or) in select - filter+arithmetic fusion (where + derived col) - F64 NaN-aware DAG comparison DOCUMENTED INCONSISTENCY found during this session: the DAG path widens narrow-int arithmetic (I16+I16 → I64) while the eval-path kernel preserves type ((+ 5h 3h) → 8h, I16). The new test pins the DAG's current widening behaviour so a future alignment fix surfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ture A magisterial integration test that builds one 200-row table with 11 columns spanning every primitive type (I64, I16, I32, F64, SYM, STR, B8, DATE, TIME, plus derived F64 price and signed I64 qty) and runs: * 12 atomic aggregations (sum/count/avg/min/max/first/last) per column * 7 comparison-operator selects across SYM/F64/I64/B8 columns * 7 sort variants (asc/desc/iasc/idesc/rank/xasc/xdesc) * distinct/take/reverse/concat/in across multiple types * group-by per key type (SYM, STR, B8, DATE, I16) plus 2 multi-key * DAG-derived columns via select+arithmetic * inner/left/anti joins with partial-coverage lookup tables * pivot with sum / count / avg aggregators * cast across i16/i32/i64/f64/b8 * update / modify / insert / upsert * csv round-trip (.csv.write / .csv.read) * splayed round-trip (set-splayed / get-splayed) ΔLine coverage from this single file: +49 lines. KNOWN BUGS uncovered while writing this test, pinned with comments: - (first dt-col) drops DATE type → i64 (returns days-since-epoch) - (first tm-col) drops TIME type → i64 (ms since midnight) - (last bool-col) drops BOOL type → 0/1 i64 - set-splayed of a table with a SYM column then get-splayed yields "error: corrupt"; same fixture sans SYM round-trips cleanly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ray_first_fn and ray_last_fn whitelisted SYM/I16/I32/GUID/STR for the type-preserving collection_elem path; everything else fell through to AGG_VEC_VIA_DAG which produced an i64 result for these types: (first [2024.01.01 2024.01.02]) → 8766 (was: 'date 2024.01.01) (first [09:30:00.000 ...]) → 34200000 (was: 'time 09:30:00.000) (last [true false]) → 0 (was: 'b8 false) Add DATE / TIME / TIMESTAMP / BOOL / U8 to the whitelist so they follow the same type-preserving path. collection_elem already builds typed atoms for all of them via ray_date / ray_time / ray_timestamp / ray_bool / ray_u8 — this fix simply routes there. Discovered while writing test/rfl/integration/cross_type_workout.rfl; that file's pinned-bug TODOs are now upgraded to type-asserting assertions. Plus per-type regression coverage in test/rfl/agg/{first,last}.rfl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A 1000-row two-column fixture with 50 distinct keys driving group.c:: exec_group through every aggregator combination + every primitive key type. Aggregators covered in group-by context: count, sum, avg, min, max, first, last, dev, var, stddev, dev_pop, var_pop, stddev_pop. Plus multi-aggregator (7-agg) select, group-by + filter pushdown, and no-by-clause aggregation over the whole table. Key types: I64, SYM, BOOL, DATE, F64, I16, I32 — each routes through a separate hash-key path in the parallel radix groupby. KNOWN BUGS pinned with `!- length`: - (med v) inside (select … by:) raises "length: non-agg expression referencing a column produced a non-row-aligned result" - (diverse v) — actually a bool predicate ("all unique?"), not the count-distinct one might guess from the name; works standalone but isn't usable inside group-by either ΔLine coverage: +67 lines (mostly variant aggregator paths in group.c and per-key-type dispatch in exec_group). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends pivot.rfl with paths previously uncovered: * multi-row-key form (pivot t [a b] c v sum) — 4-column output * I64 row key (was only SYM-keyed) * DATE row key * missing-cell semantics: sum and count both yield 0 for empty groups These exercise the row-key hash + cross-product paths in tblop.c::ray_pivot_fn that the existing fixtures (one SYM row key) didn't reach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends datalog/rule.rfl with paths missed by the original 13 cases: * recursive (anc ?x ?y) — depth-3 chain plus a disjoint pair (5→6), exercising the rule fixed-point loop in dl_compile_rule * two-clause derivation (cofriend) — duplicate triggers de-dupe * multi-constant body (dept 10 ∧ level 'senior) — separate filter branches in dl_parse_body_clause Should hit some of the 247 uncovered lines in datalog.c::dl_compile_rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

singaraiona · 2026-04-28T16:34:47Z

Code review

Strengths

Each fix has regression coverage in test/rfl/ — test every fix rule met.
Comments explain why (glob consolidation rationale, DAG vs. eval whitelist, missing-symfile contract).
glob.[ch] consolidation is the right call: kills three diverged matchers and the catastrophic-backtracking blowup.
op_callf arity-error path now releases all n args, not just the consumed prefix.

Issues to address before merge

src/ops/cmp.c:187 — sym_atom_cmp swallows lookup failure

int r = (sa && sb) ? ray_str_cmp(sa, sb) : 0;

If one ray_sym_str returns NULL, you return 0 (equal). That makes unequal symbols compare equal and lets a corrupted intern table slip past. Either propagate an error or fall back to comparing raw i64 ids — never silently say "equal".

src/ops/arith.c:332 — narrow-int neg/abs and INT_MIN UB

if (x->type == -RAY_I32) return make_i32(-x->i32);
if (x->type == -RAY_I16) return make_i16(-x->i16);

-INT32_MIN / -INT16_MIN are signed overflow (UB). The previous code widened to i64 specifically to dodge this. Either guard x->i32 == INT32_MIN (return typed null or saturate) or cast through unsigned for defined wraparound. UBSan should catch it if test/rfl/arith/neg.rfl includes an INT32_MIN row — please add one.

src/store/fileio.c:122 — ray_mkdir_p fixed buf[1024]
POSIX PATH_MAX is 4096 on Linux. Deep splayed paths could overflow. Suggest PATH_MAX / MAX_PATH.

Worth flagging (not blockers)

OP_AND/OP_OR registration removed but DAG still uses them. register_binary_op("and", …, OP_AND) is gone; eval-time and/or are now RAY_VARY. The DAG path still works for binary (and a b) because src/ops/query.c::resolve_binary_dag resolves "and" → ray_and independently. But (and a b c) with >2 args inside (select t (where ...)) will now fall off the DAG path (binary-only resolver) and run row-by-row in eval. Silent perf cliff, not correctness. Worth a pin test.

and/or don't short-circuit. Args are pre-evaluated (no RAY_FN_SPECIAL_FORM). Same as before for binary, but extending to variadic makes the lack of short-circuit more user-visible. Consider promoting to special form in a follow-up.

OP_LIKE syntax silently changed from SQL %/_ to glob */?. Documented in PR body, and test_exec.c was updated. Worth one explicit test that asserts (like x "%") matches a literal % — confirms the consolidation in both directions.

src/ops/glob.c::match_class accepts unterminated [. If pattern is "abc[def" the loop exits at pn without seeing ] and the match continues using whatever matched flag accumulated. Lenient behavior; worth a one-liner in glob.h documenting the policy so it's not later filed as a bug.

Convention checks

No malloc/free, no third-party names, one-word filenames — all good.
ray_sym_str retain/release pairing in cmp.c and strop.c looks right.

Verdict

Approve once the three blockers above (sym NULL fallback, INT_MIN UB, PATH_MAX) are addressed. The fixes themselves are tight, and the coverage uplift is genuine value.

Address Anton's PR RayforceDB#8 review: [blocker] sym_atom_cmp: NULL ray_sym_str → fall back to comparing raw interned ids instead of returning 0 (which would silently collapse distinct symbols if the intern table is corrupted or uninitialised). Stable, total ordering preserved. [blocker] neg/abs INT_MIN UB: -INT16_MIN / -INT32_MIN / -INT64_MIN were signed-overflow UB. Negate via unsigned cast — wraparound is defined for unsigned types and the result wraps back to INT_MIN consistently with binary `(- 0 INT_MIN)`. Added regression rows in arith/neg.rfl and arith/abs.rfl that pin INT16_MIN / INT32_MIN behaviour (via cast since `-32768h` literal is unrepresentable). [blocker] ray_mkdir_p path buffer: replace hardcoded `char buf[1024]` with `RAY_PATH_MAX` (PATH_MAX on POSIX, 4096 on Windows). Deep splayed paths like /db/yyyy.mm.dd/leaf/ now fit. [bonus] glob `%`/`_` literal-match assertions in strop/like.rfl — makes the SQL→glob consolidation explicit in tests, not just docs. [bonus] glob.h documents the lenient unterminated-class policy (matches glibc fnmatch semantics; never produces parse error). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ser-vasilich · 2026-04-28T17:02:44Z

Thanks for the review. All three blockers addressed in 82fae9d:

1. sym_atom_cmp NULL fallback — now compares raw interned ids when ray_sym_str returns NULL. Stable, total, never collapses distinct symbols.

2. neg/abs INT_MIN UB — switched to unsigned-wrap negation ((int32_t)(-(uint32_t)x) etc.). Wraparound defined for unsigned, so (neg INT16_MIN) → INT16_MIN instead of UB. Added INT16_MIN / INT32_MIN regression rows in test/rfl/arith/neg.rfl and abs.rfl (via cast since -32768h literal is unrepresentable — parser tokenises positive then negates).

3. ray_mkdir_p buffer — char buf[RAY_PATH_MAX] (PATH_MAX on POSIX, 4096 on Win32).

Bonus:

test/rfl/strop/like.rfl adds explicit (like "100%" "100%") -- true and three companions making the SQL→glob consolidation visible at the test level.
src/ops/glob.h documents the lenient unterminated-class policy (matches glibc fnmatch).

The (and a b c) in select where: falling off the DAG is a real silent perf cliff but not a correctness issue — left for a follow-up since the surrounding refactor (DAG resolver supporting variadic logical ops) is larger than this PR.

Tests still 974 of 975 passing under ASan + UBSan.

Previous implementation used unsigned-wrap negation, so: (abs -32768h) -> -32768h ❌ negative result from abs! (abs (- (neg 32767h) 1h)) -> -32768h ❌ same path (neg INT_MIN) -> INT_MIN ❌ wrap, silent Math: -INT_MIN doesn't fit in the same signed width. Per k/q convention, surface the overflow as a typed null of the same width: (neg INT16_MIN) -> 0Nh (neg INT32_MIN) -> 0Ni (neg INT64_MIN) -> 0Nl (abs INT_MIN) -> typed null (same per width) Properties: ✓ Type preserved (no widening to i64) ✓ No UB (overflow detected before negation) ✓ Detectable via `nil?` ✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare in real data; modern branch predictors will treat it as a near-zero-cost test on the hot path. DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived- columns). Inconsistency between atom and DAG paths to be addressed in a follow-up. Tests in test/rfl/arith/{neg,abs}.rfl now pin: - (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type - adjacent values (INT_MAX, -INT_MAX) still negate normally Addresses Anton's PR RayforceDB#8 review blocker 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR RayforceDB#8 dropped `register_binary_op("and", ..., OP_AND)` and switched `and`/`or` to eval-time variadic via `register_vary`. `compile_expr_dag` still recognised the 2-arg form via `resolve_binary_dag` (binary shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause caller upgraded that to "WHERE predicate not supported by DAG compiler". Anton flagged this as a silent perf cliff; in practice it is a correctness regression (queries that worked in v1 / pre-PR fail outright). Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes before lowering: `(and a b c d)` → `(and (and a b) (and c d))`, depth log2(N). The fused-expr executor evaluates this as a flat sequence of binary AND/OR instructions sharing scratch registers, so no extra column allocations vs the hand-nested form. Iterative pairwise reduction inside `compile_expr_dag`; bail at >64 inputs (avoids unbounded stack array, no realistic query needs more). Balanced shape (rather than left-fold) keeps the tree symmetric and minimises dependency-chain depth, leaving room for a future predicate-parallel executor without rewriting this pass. Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`; fold semantics agrees with composing two filtered selects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR RayforceDB#8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`, losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried. Result: every arg is eagerly evaluated even when the result is already determined by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`, `(or true (slow-query))` runs `slow-query`. v1 short-circuited; v2 silently regressed. Promote both to special form so the kernel receives unevaluated AST nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`, `let`, `select` etc. Short-circuit triggers only when the running accumulator is a *scalar* with the determining truth value: scalar falsy for AND, scalar truthy for OR. When the accumulator is a vector we fall through to `ray_and_fn` / `ray_or_fn` for proper element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape semantics that the existing test asserts). Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents evaluation of subsequent args (probe via `undefined-name`), and short-circuit does *not* trigger when the result is undetermined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous implementation used unsigned-wrap negation, so: (abs -32768h) -> -32768h ❌ negative result from abs! (abs (- (neg 32767h) 1h)) -> -32768h ❌ same path (neg INT_MIN) -> INT_MIN ❌ wrap, silent Math: -INT_MIN doesn't fit in the same signed width. Per k/q convention, surface the overflow as a typed null of the same width: (neg INT16_MIN) -> 0Nh (neg INT32_MIN) -> 0Ni (neg INT64_MIN) -> 0Nl (abs INT_MIN) -> typed null (same per width) Properties: ✓ Type preserved (no widening to i64) ✓ No UB (overflow detected before negation) ✓ Detectable via `nil?` ✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare in real data; modern branch predictors will treat it as a near-zero-cost test on the hot path. DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived- columns). Inconsistency between atom and DAG paths to be addressed in a follow-up. Tests in test/rfl/arith/{neg,abs}.rfl now pin: - (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type - adjacent values (INT_MAX, -INT_MAX) still negate normally Addresses Anton's PR #8 review blocker 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #8 dropped `register_binary_op("and", ..., OP_AND)` and switched `and`/`or` to eval-time variadic via `register_vary`. `compile_expr_dag` still recognised the 2-arg form via `resolve_binary_dag` (binary shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause caller upgraded that to "WHERE predicate not supported by DAG compiler". Anton flagged this as a silent perf cliff; in practice it is a correctness regression (queries that worked in v1 / pre-PR fail outright). Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes before lowering: `(and a b c d)` → `(and (and a b) (and c d))`, depth log2(N). The fused-expr executor evaluates this as a flat sequence of binary AND/OR instructions sharing scratch registers, so no extra column allocations vs the hand-nested form. Iterative pairwise reduction inside `compile_expr_dag`; bail at >64 inputs (avoids unbounded stack array, no realistic query needs more). Balanced shape (rather than left-fold) keeps the tree symmetric and minimises dependency-chain depth, leaving room for a future predicate-parallel executor without rewriting this pass. Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`; fold semantics agrees with composing two filtered selects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`, losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried. Result: every arg is eagerly evaluated even when the result is already determined by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`, `(or true (slow-query))` runs `slow-query`. v1 short-circuited; v2 silently regressed. Promote both to special form so the kernel receives unevaluated AST nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`, `let`, `select` etc. Short-circuit triggers only when the running accumulator is a *scalar* with the determining truth value: scalar falsy for AND, scalar truthy for OR. When the accumulator is a vector we fall through to `ray_and_fn` / `ray_or_fn` for proper element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape semantics that the existing test asserts). Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents evaluation of subsequent args (probe via `undefined-name`), and short-circuit does *not* trigger when the result is undetermined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs

ser-vasilich and others added 17 commits April 27, 2026 20:58

ser-vasilich changed the title ~~fix: 7 bug probes (arity, sym ordering, type preservation, glob, splayed/parted)~~ fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout Apr 28, 2026

singaraiona requested a review from Copilot April 28, 2026 15:00

ser-vasilich and others added 3 commits April 28, 2026 18:28

singaraiona merged commit bb8884e into RayforceDB:master Apr 28, 2026
4 checks passed

ser-vasilich mentioned this pull request Apr 29, 2026

Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs #9

Merged

singaraiona added a commit that referenced this pull request Apr 29, 2026

Merge pull request #9 from ser-vasilich/post-pr8-followups

c32d379

Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout#8

fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout#8
singaraiona merged 21 commits intoRayforceDB:masterfrom
ser-vasilich:fixes/seven-bug-probes

ser-vasilich commented Apr 27, 2026 •

edited

Loading

Uh oh!

singaraiona commented Apr 28, 2026

Uh oh!

ser-vasilich commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ser-vasilich commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fixes

Coverage uplift (test-only)

Test plan

Known follow-ups (not in this PR)

Uh oh!

singaraiona commented Apr 28, 2026

Code review

Strengths

Issues to address before merge

Worth flagging (not blockers)

Convention checks

Verdict

Uh oh!

ser-vasilich commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ser-vasilich commented Apr 27, 2026 •

edited

Loading