Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ cargo test --features test-panic --release

`ffi.load("quickdecode")` uses `dlopen`, which respects `LD_LIBRARY_PATH` — **not** LuaJIT's `package.cpath`. The Makefile sets `LD_LIBRARY_PATH=target/release` for `test`/`bench`; if you invoke `busted` or `luajit` directly, set it yourself.

`make lint` currently fails on 22 `missing_safety_doc` clippy warnings on the public `qjd_*` FFI exports. This is a known deferred item (see README "Roadmap / Deferred"); don't treat the lint failure as a regression unless the count changes.
`make lint` runs clippy only (with `-D warnings`); `cargo fmt --check` is intentionally **not** part of the lint gate because the codebase uses manual column alignment in struct definitions and compact single-line literals that default rustfmt would reflow. See the README "Roadmap / Deferred" entry on fmt for context.

## Architecture

Expand Down
3 changes: 1 addition & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,8 @@ test: build ## Run cargo tests + busted Lua tests
cargo test --release
$(LUA_ENV) busted --lua=$(LUAJIT) tests/lua --lpath='./lua/?.lua'

lint: ## Run clippy (deny warnings) and rustfmt --check
lint: ## Run clippy with -D warnings
cargo clippy --release --all-targets -- -D warnings
cargo fmt --check

bench: build ## Run the LuaJIT vs cjson benchmark
$(LUA_ENV) $(LUAJIT) benches/lua_bench.lua
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,5 +72,6 @@ Items intentionally pushed out of the first implementation. Each will be picked
- **Large bench fixtures** — spec §9.3 lists `large_dump.json` (~20 MB) and `deep_nest.json` (depth stress test); not yet committed. Only `small_api.json` and `medium_resp.json` ship today.
- **`structural_mask_chunk` via shuffle-based set check** — the current AVX2 scanner does 7 `_mm256_cmpeq_epi8` + `_mm256_movemask_epi8` per chunk half (one per structural char in `{}[]:,"`). A single `_mm256_shuffle_epi8` against a 16-byte LUT plus one cmpeq can do the same set membership in 2-3 ops per half. Estimated 15-25% scanner speedup on dense-structural workloads. Not on the hot path for string-heavy payloads (those already short-circuit via the fast path).
- **Adaptive `out.reserve` in scanners** — `out.reserve(buf.len() / 6)` is calibrated for object-heavy JSON. On string-heavy multimodal payloads (one big content array, mostly base64) the actual emit rate is <1 structural per 1 KB, so we over-reserve by 100x+. Mainly a memory hygiene concern (mmap'd pages stay lazily faulted), <5% throughput effect.
- **AVX-512 scanner backend** — 64-byte → 128-byte chunks doubles scan throughput on CPUs with `avx512bw` + `vpclmulqdq` (Sapphire Rapids, Zen 4+). Estimated 2× on supporting silicon; no effect elsewhere.
- **`# Safety` docs on unsafe FFI exports** — `make lint` currently fails on 22 `missing_safety_doc` clippy warnings from the public `qjd_*` C-ABI functions. Tracked separately so the Makefile can ship with `-D warnings` already wired up.
- **AVX-512 scanner backend** — 64-byte → 128-byte chunks. On the 1 MB string-heavy bench, profile shows scan throughput is L3-bandwidth-bound, so realistic win is ~1.5–1.8×, not a clean 2×; larger wins need fixtures that fit in L1/L2. Needs `avx512bw` + `vpclmulqdq` (Sapphire Rapids, Zen 4+).
- **`cargo fmt --check` not enforced** — `make lint` runs clippy only. The codebase uses intentional manual column alignment in struct definitions and compact single-line literals that default rustfmt would reflow. Skip rather than reformat until a project-wide style decision is made.
- **`validate_brackets` fusion into scan emit loop** — surfaced by profiling: on structurally-dense workloads `validate_brackets` is 65% of parse time (second linear pass over emitted indices). Folding bracket pairing into the scan emit loop via an inline depth stack eliminates that pass. No effect on the current string-heavy bench (0.3% there); a win for config / JSONL / table-shape JSON.
2 changes: 1 addition & 1 deletion src/decode/number.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ mod tests {
}

#[test] fn f64_zero() { assert_eq!(parse_f64(b"0.0").unwrap(), 0.0); }
#[test] fn f64_pi() { assert!((parse_f64(b"3.14").unwrap() - 3.14).abs() < 1e-12); }
#[test] fn f64_inexact_decimal() { assert!((parse_f64(b"1.7").unwrap() - 1.7).abs() < 1e-12); }
#[test] fn f64_negative(){ assert_eq!(parse_f64(b"-1.5").unwrap(), -1.5); }
#[test] fn f64_exponent(){ assert_eq!(parse_f64(b"1e2").unwrap(), 100.0); }

Expand Down
Loading
Loading