Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The `Makefile` is the canonical entry point; `make help` lists targets.
make build # cargo build --release → target/release/libquickdecode.so
make test # cargo test --release + busted Lua tests
make lint # cargo clippy -D warnings + cargo fmt --check
make bench # LuaJIT vs lua-cjson on benches/fixtures
make bench # OpenResty LuaJIT benchmark vs lua-cjson and simdjson
```

Under the hood / for narrower invocations:
Expand Down Expand Up @@ -79,7 +79,7 @@ src/
lua/quickdecode.lua LuaJIT wrapper (ffi.cdef + Doc/Cursor metatables)
include/lua_quick_decode.h public C header
tests/ Rust integration tests + tests/lua/ busted suite
benches/ lua_bench.lua vs lua-cjson; fixtures/ has small_api.json + medium_resp.json
benches/ lua_bench.lua vs lua-cjson/simdjson; fixtures/ has small_api.json + medium_resp.json
```

The enum values in `src/error.rs` are duplicated in `include/lua_quick_decode.h` and `lua/quickdecode.lua` (the latter only encodes the `T_*` type tags and `NOT_FOUND = 2`). Keep all three in sync when adding/renumbering codes.
Expand Down
23 changes: 14 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
# Overridable: `make bench LUAJIT=/path/to/luajit LUA_CPATH='...'`
LUAJIT ?= $(shell command -v luajit 2>/dev/null || echo /usr/local/openresty/luajit/bin/luajit)
LUA_CPATH ?= ./vendor/lua-cjson/?.so;./?.so;/usr/local/openresty/lualib/?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/openresty/luajit/lib/lua/5.1/?.so

LUAJIT_PREFIX ?= $(shell dirname $$(dirname $$(command -v $(LUAJIT) 2>/dev/null || echo /usr/local/openresty/luajit/bin/luajit)))
# Overridable: `make bench LUAJIT=/path/to/luajit RESTY=/path/to/resty LUA_CPATH='...'`
OPENRESTY ?= /usr/local/openresty
OPENRESTY_LUAJIT := $(OPENRESTY)/luajit/bin/luajit
OPENRESTY_RESTY := $(OPENRESTY)/bin/resty
LUAJIT ?= $(shell if [ -x "$(OPENRESTY_LUAJIT)" ]; then echo "$(OPENRESTY_LUAJIT)"; else command -v luajit 2>/dev/null || echo luajit; fi)
RESTY ?= $(shell if [ -x "$(OPENRESTY_RESTY)" ]; then echo "$(OPENRESTY_RESTY)"; else command -v resty 2>/dev/null || echo resty; fi)
LUA_PATH ?= ./lua/?.lua;$(OPENRESTY)/lualib/?.lua;$(OPENRESTY)/lualib/?/init.lua;;
LUA_CPATH ?= ./vendor/lua-cjson/?.so;./target/release/lib?.so;./?.so;$(OPENRESTY)/lualib/?.so;/usr/local/lib/lua/5.1/?.so;$(OPENRESTY)/luajit/lib/lua/5.1/?.so

LUAJIT_PREFIX ?= $(shell dirname $$(dirname $$(command -v $(LUAJIT) 2>/dev/null || echo $(OPENRESTY_LUAJIT))))
LUAJIT_INC ?= $(LUAJIT_PREFIX)/include/luajit-2.1

LIB_DIR := $(CURDIR)/target/release
ifeq ($(shell uname),Darwin)
LUA_ENV := DYLD_LIBRARY_PATH=$(LIB_DIR) LUA_CPATH='$(LUA_CPATH)'
LUA_ENV := DYLD_LIBRARY_PATH=$(LIB_DIR) LUA_PATH='$(LUA_PATH)' LUA_CPATH='$(LUA_CPATH)'
else
LUA_ENV := LD_LIBRARY_PATH=$(LIB_DIR) LUA_CPATH='$(LUA_CPATH)'
LUA_ENV := LD_LIBRARY_PATH=$(LIB_DIR) LUA_PATH='$(LUA_PATH)' LUA_CPATH='$(LUA_CPATH)'
endif

.PHONY: help build test lint bench clean
Expand All @@ -29,8 +34,8 @@ test: build ## Run cargo tests + busted Lua tests
lint: ## Run clippy with -D warnings
cargo clippy --release --all-targets -- -D warnings

bench: build vendor/lua-cjson/cjson.so ## Run the LuaJIT vs cjson benchmark
$(LUA_ENV) $(LUAJIT) benches/lua_bench.lua
bench: build vendor/lua-cjson/cjson.so ## Run the OpenResty LuaJIT benchmark
$(LUA_ENV) $(RESTY) benches/lua_bench.lua

vendor/lua-cjson/cjson.so: | vendor/lua-cjson/Makefile
ifeq ($(shell uname),Darwin)
Expand Down
37 changes: 15 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Rust-implemented fast JSON decoder exposed to LuaJIT via FFI. Optimized for the

## Status

Initial implementation complete: scalar + AVX2/PCLMUL + ARM64 NEON/PMULL structural scanner (runtime-dispatched), root-path and cursor APIs, escape-decoded strings, integer/float/bool/typeof/len, FFI panic barrier, and a LuaJIT wrapper. Rust unit/integration tests and Lua busted tests run in CI. The benchmark harness compares against lua-cjson but tuning is pending — see `Roadmap / Deferred` below.
Initial implementation complete: scalar + AVX2/PCLMUL + ARM64 NEON/PMULL structural scanner (runtime-dispatched), root-path and cursor APIs, escape-decoded strings, integer/float/bool/typeof/len, FFI panic barrier, and a LuaJIT wrapper. Rust unit/integration tests and Lua busted tests run in CI. The benchmark harness compares against lua-cjson and lua-resty-simdjson.

## Building

Expand Down Expand Up @@ -83,38 +83,31 @@ busted tests/lua --lpath='./lua/?.lua' --cpath='./target/release/lib?.so'
## Benchmarks

`quickdecode` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal
chat-completion payloads, "parse + access 3 fields" workload (median ops/s
under LuaJIT 2.1, Skylake; 5 rounds, deterministic payload):
chat-completion payloads, "parse + access model, temperature, and all
messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1,
Intel Core i5-9400; 5 rounds, deterministic payload):

Comment thread
jarvis9443 marked this conversation as resolved.
| Size | cjson | simdjson | `qd.parse` | `qd.decode + t.f x3` | speedup vs. cjson |
| Size | cjson | simdjson | `qd.parse` | `qd.decode + access content` | speedup vs. cjson |
|---:|---:|---:|---:|---:|---:|
| 2 KB | 39,414 | 54,395 | 117,233 | 126,807 | 3.0× / 3.2× |
| 100 KB | 2,589 | 19,944 | 72,202 | 61,162 | 27.9× / 23.6× |
| 1 MB | 355 | 2,048 | 12,723 | 12,448 | 35.8× / 35.1× |
| 10 MB | 32 | 128 | 537 | 609 | 16.8× / 19.0× |
| 2 KB | 106,646 | 137,427 | 135,296 | 97,574 | 1.3× / 0.9× |
| 100 KB | 6,045 | 46,577 | 137,931 | 134,590 | 22.8× / 22.3× |
| 1 MB | 594 | 4,408 | 16,447 | 16,340 | 27.7× / 27.5× |
| 10 MB | 59 | 356 | 1,035 | 1,028 | 17.5× / 17.4× |

`qd.parse` wins because it skips building a Lua table for the parts you
never read; `qd.decode + t.field` adds a cjson-shaped table proxy on top
with similar throughput. Memory retention for `quickdecode` is essentially
flat in payload size (a few KB for the reusable buffers), where `cjson`
and `simdjson` retain ~1× the input size as live Lua-table state.

ARM64 (Apple M4, NEON/PMULL scanner, same workload):

| Size | cjson | `qd.parse` | `qd.decode + t.f x3` | speedup vs. cjson |
|---:|---:|---:|---:|---:|
| 2 KB | 237,124 | 705,000 | 390,000 | 3.0× / 1.6× |
| 100 KB | 14,667 | 232,000 | 208,000 | 15.8× / 14.2× |
| 1 MB | 1,494 | 33,700 | 33,000 | 22.6× / 22.1× |
| 10 MB | 150 | 3,376 | 3,454 | 22.5× / 23.0× |
flat in payload size (a few KB for the reusable buffers), while `cjson`
and `simdjson` retain more Lua heap because they materialize the table tree.

See [`docs/benchmarks.md`](docs/benchmarks.md) for the full size ladder,
memory numbers, an "encode round-trip" row (passthrough emit via
`memcpy`), the pure-decode (no-access) comparison, and the exact
methodology + reproduction command.
`memcpy`), exact environment, and the reproduction command. `make bench`
uses `lua-resty-simdjson` when `resty.simdjson` is available in the
OpenResty environment; otherwise it skips the simdjson rows.

```sh
make bench # quickdecode vs cjson
make bench # quickdecode vs cjson and lua-resty-simdjson
```

## RFC 8259 conformance
Expand Down
162 changes: 95 additions & 67 deletions benches/lua_bench.lua
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ package.cpath = package.cpath .. ";./target/release/lib?.so"

local qd = require("quickdecode")
local cjson = require("cjson")
local simdjson_ok, simdjson_or_err = pcall(function()
return require("resty.simdjson").new()
end)
local simdjson = simdjson_ok and simdjson_or_err or nil

local function read_file(p)
local f = assert(io.open(p, "rb"))
Expand All @@ -11,19 +15,14 @@ local function read_file(p)
return s
end

-- Shape: a multimodal chat-completion request with one ~1.5K text question
-- and N base64-encoded image parts (each 50-500 KB) until the payload reaches
-- target_bytes. Mirrors the production case the bench is meant to reflect.
-- Shape: a multimodal chat-completion request with one or more historical
-- messages. Each message contains one small text part and one base64-encoded
-- image part. The number of messages scales with payload size: a 10 MB request
-- has roughly ten 1 MB image-bearing messages.
--
-- Image sizes are drawn from a deterministic Park-Miller LCG (not math.random,
-- which delegates to libc rand() and varies across machines) so the same
-- target_bytes produces byte-identical output on any LuaJIT 2.1 host.
--
-- Size accuracy: the normal-branch upper is `min(500K, remaining)` so the
-- loop cannot overshoot during steady state. When fewer than 50 KB remain
-- the final image falls through to `math.max(1024, remaining)` — undershoot
-- is at most a few hundred bytes; worst-case overshoot is ~1 KB (only when
-- `remaining < 1024`, which the seed=42 walk does not hit for our ladder).
-- Size accuracy: payload sizing is approximate. Message separators, role
-- strings, and the 1 KB minimum image size can add small drift from
-- `target_bytes` on tiny scenarios; larger scenarios stay close to target.
-- GitHub-style payload: simulates /repos/{owner}/{repo}/issues response.
-- Each issue has ~20 fields including nested user object, labels array,
-- and realistic string lengths (URLs, timestamps, markdown body).
Expand Down Expand Up @@ -117,41 +116,28 @@ local function make_b64(size)
end

local function make_payload(target_bytes)
local rng_state = 42
local function rng_range(lo, hi)
-- Park-Miller minimal-standard LCG: a=48271, m=2^31-1. Multiplication
-- fits in double precision (48271 * 2^31 < 2^53).
rng_state = (rng_state * 48271) % 2147483647
return lo + (rng_state % (hi - lo + 1))
end

local text = string.rep("Q", 1500)
local message_count = math.max(1, math.ceil(target_bytes / (1024 * 1024)))
local envelope = '{"model":"gpt-4-vision","temperature":0.7,"messages":[]}'
local text = string.rep("Q", 256)
local text_part = '{"type":"text","text":"' .. text .. '"}'
local parts = { text_part }
local current = 200 + #text_part -- approx outer envelope overhead

while current < target_bytes do
local remaining = target_bytes - current
local img_size
if remaining < 50 * 1024 then
-- Final image: shrink below the 50 KB floor so the label matches
-- the actual payload size. Bench iters all see the same payload
-- regardless, so the smaller tail blob doesn't change what's
-- being measured.
img_size = math.max(1024, remaining)
else
local upper = math.min(500 * 1024, remaining)
img_size = rng_range(50 * 1024, upper)
end
local b64 = make_b64(img_size)
local img_part = '{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,'
.. b64 .. '"}}'
parts[#parts + 1] = img_part
current = current + #img_part + 1 -- +1 for comma
local image_prefix = '{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,'
local image_suffix = '"}}'
local message_overhead = #('{"role":"user","content":[,]}') + #text_part
+ #image_prefix + #image_suffix
local remaining = target_bytes - #envelope - (message_count * message_overhead)
local image_size = math.max(1024, math.floor(remaining / message_count))

local messages = {}
for i = 1, message_count do
local role = i % 2 == 1 and "user" or "assistant"
local b64 = make_b64(image_size)
local image_part = image_prefix .. b64 .. image_suffix
messages[i] = '{"role":"' .. role .. '","content":['
.. text_part .. "," .. image_part .. ']}'
end

return '{"model":"gpt-4-vision","temperature":0.7,"messages":'
.. '[{"role":"user","content":[' .. table.concat(parts, ",") .. ']}]}'
return '{"model":"gpt-4-vision","temperature":0.7,"messages":['
.. table.concat(messages, ",") .. ']}'
end

local ROUNDS = 5
Expand Down Expand Up @@ -190,19 +176,48 @@ end
local function default_cjson_access(obj)
local _ = obj.model
local _ = obj.temperature
local _ = obj.messages and obj.messages[1] and obj.messages[1].role
if obj.messages then
for _, msg in ipairs(obj.messages) do
local _ = msg.content
end
end
end

local content_paths_by_message_count = {}

local function content_paths(n)
local paths = content_paths_by_message_count[n]
if paths then
return paths
end

paths = {}
for i = 0, n - 1 do
paths[i + 1] = "messages[" .. i .. "].content"
end
content_paths_by_message_count[n] = paths
return paths
end

local function default_qd_access(d)
local _ = d:get_str("model")
local _ = d:get_f64("temperature")
local _ = d:get_str("messages[0].role")
local n = d:len("messages") or 0
local paths = content_paths(n)
for i = 1, n do
local _ = d:typeof(paths[i])
end
end
Comment thread
jarvis9443 marked this conversation as resolved.

local function default_table_access(t)
local _ = t.model
local _ = t.temperature
local _ = t.messages and t.messages[1] and t.messages[1].role
if t.messages then
for i = 1, qd.len(t.messages) do
local msg = t.messages[i]
local _ = msg.content
end
end
end

-- GitHub issues accessors: array of issues, access first issue's fields
Expand Down Expand Up @@ -243,25 +258,37 @@ local scenarios = {
local has_pooled_api = type(qd.new_decoder) == "function"
local pooled_decoder = has_pooled_api and qd.new_decoder() or nil

if not simdjson then
print("lua-resty-simdjson unavailable; skipping simdjson rows: "
.. tostring(simdjson_or_err))
end

for _, s in ipairs(scenarios) do
print(string.format("=== %s (%d bytes) ===", s.name, #s.payload))

local cjson_access = s.cjson_access or default_cjson_access
local qd_access = s.qd_access or default_qd_access
local table_access = s.table_access or default_table_access

bench("cjson.decode + access 3 fields", s.iters, function()
bench("cjson.decode + access fields", s.iters, function()
local obj = cjson.decode(s.payload)
cjson_access(obj)
end)

bench("quickdecode.parse + access 3 fields", s.iters, function()
if simdjson then
bench("simdjson.decode + access fields", s.iters, function()
local obj = simdjson:decode(s.payload)
cjson_access(obj)
end)
end

bench("quickdecode.parse + access fields", s.iters, function()
local d = qd.parse(s.payload)
qd_access(d)
end)

if has_pooled_api then
bench("quickdecode pooled :parse + access 3 fields", s.iters, function()
bench("quickdecode pooled :parse + access fields", s.iters, function()
local d = pooled_decoder:parse(s.payload)
qd_access(d)
end)
Expand All @@ -273,7 +300,7 @@ for _, s in ipairs(scenarios) do
end)
end

bench("qd.decode + t.field x3", s.iters, function()
bench("qd.decode + access content", s.iters, function()
local t = qd.decode(s.payload)
table_access(t)
end)
Expand Down Expand Up @@ -315,41 +342,42 @@ print(string.format("=== interleaved %s ===", table.concat(interleaved_names, ",

do
local next_p = make_cycler(interleaved)
bench("cjson.decode + access 3 fields", 400, function()
bench("cjson.decode + access fields", 400, function()
local p = next_p()
local obj = cjson.decode(p)
local _ = obj.model
local _ = obj.temperature
local _ = obj.messages and obj.messages[1] and obj.messages[1].role
default_cjson_access(obj)
end)

if simdjson then
next_p = make_cycler(interleaved)
bench("simdjson.decode + access fields", 400, function()
local p = next_p()
local obj = simdjson:decode(p)
default_cjson_access(obj)
end)
end

next_p = make_cycler(interleaved)
bench("quickdecode.parse + access 3 fields", 400, function()
bench("quickdecode.parse + access fields", 400, function()
local p = next_p()
local d = qd.parse(p)
local _ = d:get_str("model")
local _ = d:get_f64("temperature")
local _ = d:get_str("messages[0].role")
default_qd_access(d)
end)

if has_pooled_api then
next_p = make_cycler(interleaved)
bench("quickdecode pooled :parse + access 3 fields", 400, function()
bench("quickdecode pooled :parse + access fields", 400, function()
local p = next_p()
local d = pooled_decoder:parse(p)
local _ = d:get_str("model")
local _ = d:get_f64("temperature")
local _ = d:get_str("messages[0].role")
default_qd_access(d)
end)
end

next_p = make_cycler(interleaved)
bench("qd.decode + t.field x3", 400, function()
bench("qd.decode + access content", 400, function()
local p = next_p()
local t = qd.decode(p)
local _ = t.model
local _ = t.temperature
local _ = t.messages and t.messages[1] and t.messages[1].role
default_table_access(t)
end)

next_p = make_cycler(interleaved)
Expand Down
Loading
Loading