Skip to content

Fix compaction support#7

Merged
tilo merged 1 commit into
tilo:mainfrom
byroot:compaction
Jun 12, 2026
Merged

Fix compaction support#7
tilo merged 1 commit into
tilo:mainfrom
byroot:compaction

Conversation

@byroot

@byroot byroot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

When storing Ruby object that are movable in C global variable, the extension MUST declare these variables to the GC, otherwise they won't be marked nor updated when compaction occurs.

When storing Ruby object that are movable in C global
variable, the extension MUST declare these variables to
the GC, otherwise they won't be marked nor updated when
compaction occurs.
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.83%. Comparing base (a5f3eac) to head (409e403).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main       #7   +/-   ##
=======================================
  Coverage   99.83%   99.83%           
=======================================
  Files           7        7           
  Lines        1193     1193           
=======================================
  Hits         1191     1191           
  Misses          2        2           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tilo

tilo commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Thank you @byroot, great catch!

It never surfaced in any of my tests, so it would've sat silently in my code.
Really glad you caught it.

After reading this I went through the rest of the C globals to be sure: the fj_sym_* are interned symbols (pinned), and mSmarterJSON is only touched in Init, which lines up with the three you rooted. Please let me know if you see anything else.

Really appreciate you taking the time to read the extension. 🙏💚

@tilo tilo merged commit eea3cb6 into tilo:main Jun 12, 2026
21 checks passed
@byroot

byroot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

fj_sym_* are interned symbols (pinned),

Well, they're not pinned, but immediates (they don't actually exist in memory). But yes, they don't need to be declared.

Really appreciate you taking the time to read the extension. 🙏💚

Welcome, It got featured in Ruby Weekly, that made me curious.

I need to look closer at your Eisel-Lemire port, see how it compare against the Ryu port I'm using.

BTW since you mention it in your README, in json 2.20 the parser will be iterative, so the depth limitation will be lifted, however I haven't yet decided if I'll change the default max_nesting, as there aren't a lot of legitimate documents that need more than that, so it's a decent stop on abuses.

@byroot byroot deleted the compaction branch June 12, 2026 06:37
@tilo

tilo commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Well, they're not pinned, but immediates (they don't actually exist in memory). But yes, they don't need to be declared.

@byroot Thanks for the correction, Jean — "immediates" ✔️
rb_intern'd symbols are tagged values, not heap objects, so the GC never marks or moves them and there's nothing to declare. Good to get my mental model straight.

Great to hear about the iterative parser landing in json 2.20 — I went iterative for the same reason.

@tilo

tilo commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Regarding Eisel-Lemire vs Ryū:

  • Speed (≤17-digit common case):
    Eisel-Lemire came in ~40% faster (YJIT on) — canada.json went 275 → 385 MB/s.
    The single 128-bit multiply beats mulShift64 plus the divisibility/rounding branches. Lines up with fast_float's published edge over Ryū s2d.

  • Correctness past 17:
    Before porting Eisel-Lemire I first just tried bumping your s2d fast-path cap from ≤17 to ≤18 digits (uint64 holds 18). A 3M-value stress caught 890 mismatches — all round-half-to-even ties (mantissa …5), off by 1 ULP. So I ended up independently rediscovering exactly why you cap at 17. 😂
    Eisel-Lemire is correctly-rounded across the full ≤18/19-digit uint64 range with no slow path, so we keep the 18th digit and the speed.
    Re-verified bit-for-bit vs JSON.parse over 8–10M random values (1–19 digits, forced ties, exponents −330..305 incl. subnormals and near-overflow) — 0 mismatches.
    Above that we fall back to strtod with round-to-odd.

My json_benchmarks repo has the sample files I used — the corpus is in data_files.zip (GH LFS).
Please let me know if I can improve anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants