bench: compare message content parsing#41
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR changes the benchmark fixture to a multi-message chat shape, updates accessors to read every message.content, rewires scenario labels/runs to call those accessors, adjusts Makefile to run via OpenResty, and refreshes docs and README with new workload, results, and reproduction instructions. ChangesBenchmark measurement and documentation update
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
benches/lua_bench.lua (1)
335-339: ⚡ Quick winReduce accessor drift by reusing the default accessor helpers in interleaved benches.
The interleaved block now duplicates content-access logic already defined in
default_cjson_access,default_qd_access, anddefault_table_access. Reusing those helpers keeps workload semantics aligned across benchmark modes and avoids future divergence.♻️ Proposed refactor
bench("cjson.decode + access fields", 400, function() local p = next_p() local obj = cjson.decode(p) - local _ = obj.model - local _ = obj.temperature - if obj.messages then - for _, msg in ipairs(obj.messages) do - local _ = msg.content - end - end + default_cjson_access(obj) end) ... bench("quickdecode.parse + access fields", 400, function() local p = next_p() local d = qd.parse(p) - local _ = d:get_str("model") - local _ = d:get_f64("temperature") - local n = d:len("messages") or 0 - for i = 0, n - 1 do - local _ = d:typeof("messages[" .. i .. "].content") - end + default_qd_access(d) end) ... bench("quickdecode pooled :parse + access fields", 400, function() local p = next_p() local d = pooled_decoder:parse(p) - local _ = d:get_str("model") - local _ = d:get_f64("temperature") - local n = d:len("messages") or 0 - for i = 0, n - 1 do - local _ = d:typeof("messages[" .. i .. "].content") - end + default_qd_access(d) end) ... bench("qd.decode + access content", 400, function() local p = next_p() local t = qd.decode(p) - local _ = t.model - local _ = t.temperature - if t.messages then - for i = 1, qd.len(t.messages) do - local msg = t.messages[i] - local _ = msg.content - end - end + default_table_access(t) end)Also applies to: 348-351, 361-364, 374-379
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benches/lua_bench.lua` around lines 335 - 339, The interleaved bench block duplicates message content-access logic that already exists in the helper functions default_cjson_access, default_qd_access, and default_table_access; replace the manual loop over obj.messages and the local _ = msg.content access by invoking the appropriate helper (e.g., call default_cjson_access(obj) or default_qd_access(obj) or default_table_access(obj) depending on the active accessor) so the interleaved path reuses the shared accessor logic, and remove the duplicated per-message accessor code to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/benchmarks.md`:
- Line 53: Update the benchmark table entry for "quickdecode.parse + access
fields" so the workload description removes the stray word "type" and reads
explicitly that it "reads every `messages[i].content`" (the cell that currently
contains "`qd.parse(s)`, read `model` / `temperature`, then read every
`messages[i].content` type" should be changed to drop "type"); edit the
`qd.parse(s)` / `messages[i].content` text in the same table row to match this
corrected phrasing.
---
Nitpick comments:
In `@benches/lua_bench.lua`:
- Around line 335-339: The interleaved bench block duplicates message
content-access logic that already exists in the helper functions
default_cjson_access, default_qd_access, and default_table_access; replace the
manual loop over obj.messages and the local _ = msg.content access by invoking
the appropriate helper (e.g., call default_cjson_access(obj) or
default_qd_access(obj) or default_table_access(obj) depending on the active
accessor) so the interleaved path reuses the shared accessor logic, and remove
the duplicated per-message accessor code to avoid drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b43f62f9-cf27-4dd1-98e6-5e10ac8a0487
📒 Files selected for processing (2)
benches/lua_bench.luadocs/benchmarks.md
There was a problem hiding this comment.
Pull request overview
Updates the Lua benchmark workload to touch all chat messages’ content fields (instead of only the first message’s role) so benchmark access patterns better reflect how request bodies are inspected in the proxy path.
Changes:
- Updated
benches/lua_bench.luaaccessors to iterate throughmessages[*].contentfor bothcjsonandqd.decode, and totypeofeachmessages[i].contentforqd.parse. - Refreshed
docs/benchmarks.mdto match the new benchmark workload and updated result tables/observations accordingly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/benchmarks.md | Updates benchmark narrative, workload definitions, and published results for the new “touch message content” access pattern. |
| benches/lua_bench.lua | Changes benchmark access functions to read/inspect each message’s content rather than only messages[1].role. |
Comments suppressed due to low confidence (2)
docs/benchmarks.md:91
- The speed-up table header uses
qd.decode + access, but the workload row/bench output usesqd.decode + access content. For consistency and easier cross-referencing, align the label here as well.
| Scenario | `qd.parse` / cjson | `qd.decode + access` / cjson |
|---|---:|---:|
docs/benchmarks.md:110
- The memory delta table header uses
qd.decode + access, but the workload/bench label isqd.decode + access content. Consider renaming this column header to match the actual benchmark row name.
| Scenario | cjson | `qd.parse` | `qd.decode + access` | `qd.decode + qd.encode` |
|---|---:|---:|---:|---:|
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
benches/lua_bench.lua:206
default_qd_accessbuilds a new path string ("messages[" .. i .. "].content") on every iteration, so the benchmark is partly measuring Lua string concatenation/allocation rather than quickdecode field access. Consider switching to cursor-based access (e.g.,d:open("messages")+:index(i)+:typeof("content")/get_*) or precomputing paths to keep the workload focused on JSON access costs.
local n = d:len("messages") or 0
for i = 0, n - 1 do
local _ = d:typeof("messages[" .. i .. "].content")
end
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benches/lua_bench.lua (1)
247-249:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix qd.decode label for scenarios that override
table_access.For
github-100k, the benchmark does not accessmessages[*].content, but the printed label says it does.Proposed fix
--- a/benches/lua_bench.lua +++ b/benches/lua_bench.lua @@ - bench("qd.decode + access content", s.iters, function() + local qd_decode_label = s.table_access and "qd.decode + access fields" or "qd.decode + access content" + bench(qd_decode_label, s.iters, function() local t = qd.decode(s.payload) table_access(t) end)Also applies to: 293-296
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benches/lua_bench.lua` around lines 247 - 249, The benchmark label for qd.decode incorrectly claims it accesses messages[*].content for scenarios that provide a custom table_access override (e.g., the scenario entry with name "github-100k" which sets table_access = github_table_access and qd_access = github_qd_access). Update the label-generation logic that prints the qd.decode description so it checks whether a scenario supplies table_access and, if so, omits the "messages[*].content" part of the printed qd.decode label (i.e., conditionally build the qd.decode label based on the presence of table_access for entries like github-100k).
🧹 Nitpick comments (1)
benches/lua_bench.lua (1)
14-27: ⚡ Quick winUpdate stale size-accuracy commentary to match the current generator.
The comment block still references the removed
min(500K, remaining)branch logic, which no longer reflectsmake_payload.Proposed comment-only fix
--- a/benches/lua_bench.lua +++ b/benches/lua_bench.lua @@ --- Size accuracy: the normal-branch upper is `min(500K, remaining)` so the --- loop cannot overshoot during steady state. When fewer than 50 KB remain --- the final image falls through to `math.max(1024, remaining)` — undershoot --- is at most a few hundred bytes; worst-case overshoot is ~1 KB (only when --- `remaining < 1024`, which the seed=42 walk does not hit for our ladder). +-- Size accuracy: payload sizing is approximate. Message separators and +-- role-string variation introduce only small drift from `target_bytes`.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benches/lua_bench.lua` around lines 14 - 27, The comment above make_payload is stale because it still mentions the removed `min(500K, remaining)` branch; update that comment to remove the obsolete phrase and accurately describe the current generator: explain that image sizes are drawn deterministically from the Park‑Miller LCG, each chosen so it does not exceed the remaining bytes, and that when fewer than 50 KB remain the final image uses math.max(1024, remaining) (with undershoot/overshoot behavior as before). Locate the block near the make_payload function and replace the outdated sentence with this corrected description.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@benches/lua_bench.lua`:
- Around line 247-249: The benchmark label for qd.decode incorrectly claims it
accesses messages[*].content for scenarios that provide a custom table_access
override (e.g., the scenario entry with name "github-100k" which sets
table_access = github_table_access and qd_access = github_qd_access). Update the
label-generation logic that prints the qd.decode description so it checks
whether a scenario supplies table_access and, if so, omits the
"messages[*].content" part of the printed qd.decode label (i.e., conditionally
build the qd.decode label based on the presence of table_access for entries like
github-100k).
---
Nitpick comments:
In `@benches/lua_bench.lua`:
- Around line 14-27: The comment above make_payload is stale because it still
mentions the removed `min(500K, remaining)` branch; update that comment to
remove the obsolete phrase and accurately describe the current generator:
explain that image sizes are drawn deterministically from the Park‑Miller LCG,
each chosen so it does not exceed the remaining bytes, and that when fewer than
50 KB remain the final image uses math.max(1024, remaining) (with
undershoot/overshoot behavior as before). Locate the block near the make_payload
function and replace the outdated sentence with this corrected description.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4ce8e27c-5276-4ee4-a28f-6509255a2816
📒 Files selected for processing (2)
benches/lua_bench.luadocs/benchmarks.md
✅ Files skipped from review due to trivial changes (1)
- docs/benchmarks.md
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
benches/lua_bench.lua:22
- The payload generator comments still describe the old Park–Miller LCG-driven image sizing, but
make_payloadno longer uses any RNG and instead computes a fixedimage_sizeper message. Update/remove these lines so the benchmark description matches the actual payload construction.
-- Image sizes are drawn from a deterministic Park-Miller LCG (not math.random,
-- which delegates to libc rand() and varies across machines) so the same
-- target_bytes produces byte-identical output on any LuaJIT 2.1 host.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
README.md (1)
86-87: ⚡ Quick winTighten workload wording for accuracy.
Consider changing “every messages[*].content path” to “every
messages[*].contentpaths” (or “allmessages[*].contententries”) to avoid singular/plural ambiguity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 86 - 87, Update the README wording to remove singular/plural ambiguity by replacing the phrase "every messages[*].content path" with a clearer form such as "every `messages[*].content` paths" or "all `messages[*].content` entries"; locate the text containing the literal snippet messages[*].content in the README and make the replacement so the sentence reads unambiguous and grammatically correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Line 7: The README currently references a nonexistent "Roadmap / Deferred"
section ("— see `Roadmap / Deferred` below."); either add that section header
"Roadmap / Deferred" with the deferred decisions/notes (tracking tuning
decisions such as benchmark tuning) or remove/replace the inline reference
string so it no longer points to a missing section; update the sentence that
contains "— see `Roadmap / Deferred` below." and ensure the new "Roadmap /
Deferred" header contains the deferred items you want tracked.
---
Nitpick comments:
In `@README.md`:
- Around line 86-87: Update the README wording to remove singular/plural
ambiguity by replacing the phrase "every messages[*].content path" with a
clearer form such as "every `messages[*].content` paths" or "all
`messages[*].content` entries"; locate the text containing the literal snippet
messages[*].content in the README and make the replacement so the sentence reads
unambiguous and grammatically correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ffd4b74e-0cd0-4ce9-bcc9-89442506aa06
📒 Files selected for processing (3)
README.mdbenches/lua_bench.luadocs/benchmarks.md
✅ Files skipped from review due to trivial changes (1)
- docs/benchmarks.md
🚧 Files skipped from review as they are similar to previous changes (1)
- benches/lua_bench.lua
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
docs/benchmarks.md:72
- The reproduction instructions imply
make benchwill “run the full comparison”, but the repo doesn’t install/vendorlua-resty-simdjsonand the bench script requiresresty.simdjsonunconditionally. Please document this dependency explicitly (where to get it / how to add it toLUA_PATH/OpenResty) or adjust the bench to skip the simdjson rows when unavailable somake benchworks out-of-the-box.
Run the full comparison with one command:
```sh
make bench
This builds quickdecode, builds the vendored lua-cjson against OpenResty's
LuaJIT, then invokes benches/lua_bench.lua through OpenResty's resty so
lua-resty-simdjson runs in its normal ngx environment.
</details>
Update the Lua benchmark to match the AI request-body access pattern more closely.
The benchmark now:
resty, solua-resty-simdjsoncan run in its normalngxenvironmentlua-cjson,lua-resty-simdjson,quickdecode.parse,qd.decode + content access, andqd.decode + qd.encodemodel,temperature, and everymessages[*].contentValidation:
make benchmake testmake lintSummary by CodeRabbit
Behavior
Documentation
Tests / Benchmarks
Chores