Revert "perf: decoder/document instance pooling (#11)"#14
Merged
Conversation
The single-run-with-mean output the bench used to print swung 30-40%
between invocations on noisy machines, making it hard to tell signal
from noise when comparing perf commits.
- bench() now runs a warmup pass (JIT trace compile, pool fill), then
five timed rounds. Reports median and mean ops/s plus the round-by-
round min..max range so reviewers can see whether a delta is real.
- Add an `interleaved 100k,200k,500k,1m` scenario that rotates through
four payload sizes, matching a server that handles varying request
sizes back to back. The single-payload loops cannot exercise the
doc pool the way real traffic does.
- For each scenario, probe `qd.new_decoder` and run two extra qd
variants when present:
quickdecode pooled :parse — reused decoder across iters
quickdecode new_decoder()+parse — one-shot per iter (no reuse)
So a reader can directly compare the legacy qd.parse path, the
pool-API-with-reuse path, and the realistic "user creates a fresh
decoder per request" pattern in one bench run.
Also ship benches/perf_probe.lua: a minimal hammer over qd.parse on a
fixed payload for use under `perf record` when investigating FFI hot
paths. Not invoked by Makefile targets.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
0721d7d(PR perf: decoder/document instance pooling (#6) #11) which refactored the one-shotqjd_parsepath through a separately-allocatedqjd_decoder+ state machine + generation counter.Why
The pool API in #11 was designed to amortize per-parse allocation cost across many parses on a reused decoder. In our actual deployment — an API gateway that allocates a fresh decoder per request and lets GC reclaim everything — the pool advantage is zero:
Veccapacity reuse, no scratch buffer reuse, no skip-cache reuse.get_*check_doc_alive(state + generation check) become pure overhead.Bench results across the design space made this concrete:
qd.parse()(legacy)#13 tried to fix the legacy regression by folding doc + decoder into one allocation and fast-pathing
check_doc_alive. Three-run median-of-medians measurements (in #13 comments) showed the fold still trailed scan-only by 14–27% on 100 KB–5 MB payloads — the bulk of API gateway traffic. The residual cost was not worth the optionality of a pool API that production cannot use anyway.Bench (3-run median-of-medians)
7a895e5qd.parseAll deltas fall within the run-to-run noise band we saw on this machine (most rows ±10%; the 500K row's apparent +41% sits inside the underlying range overlap and is system noise, not a real win). Net read: the legacy path is fully restored to its scan-only baseline.
(*) Bench harness preserved from the fold-attempt branch — warmup + 5-round median plus an interleaved-size scenario — so future runs are easier to read.
What's reverted
src/decoder.rs(deleted) →src/doc.rs(restored)qjd_decoder_new/_free/_reset/_destroy/_parseexports goneQJD_STALE_DOCerror code gonecheck_doc_alivegonecount-allocsfeature +tests/alloc_count.rs+tests/decoder_ffi.rs+tests/lua/decoder_spec.luagonedocs/superpowers/specs/2026-05-15-decoder-pooling-design.mdgoneWhat's kept (cherry-picked from #13)
benches/lua_bench.lua: warmup pass + 5-round median + interleaved 100k–1m scenario. Theqd.new_decoderprobes degrade gracefully when the API is absent — they print nothing on this branch.benches/perf_probe.lua: minimal hammer forperf recordinvestigations.Test plan
cargo test --release(108 tests, was 132 — the 24-test delta is exactly the pool-API surface that was reverted)cargo test --release --no-default-featurescargo test --features test-panic --release7a895e5, results abovemake test(Lua busted) —bustednot on this dev machine; CI will runContext
Supersedes #13. See the closing comment on #13 and the long perf-investigation transcript that motivated this for the full reasoning.