tilo · tilo · Jun 12, 2026 · Jun 12, 2026 · Jun 13, 2026 · Jun 13, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,13 @@
 > ⚠️ We discourage the use of `process(input).first` / `process(input)[0]` because it silently drops potential additional documents
 >    Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads, because it emits on_warning if it finds multiple docs.
 
+## 1.2.0 (unreleased)
+
+RSpec tests: 1,143
+
+- A leading-zero token now reads as a number when it carries a sign, a decimal point, or an exponent (`+007` → `7`, `-000023.5` → `-23.5`, `00.0` → `0.0`, `007e2` → `700.0`) — previously these were kept as strings. A bare leading-zero integer (`000001`, `02`) still reads as a string, so IDs, zip codes, and account numbers keep their zeros.
+- `Null` and `NULL` are now read as `nil` (joining `null` / `None` / `undefined`), for SQL / R / PHP / YAML / DB-derived input — in every position the existing spellings work. Quoted (`"NULL"`) or embedded (`NULL Island`) forms stay strings.
+
 ## 1.1.2 (2026-06-12)
 
 RSpec tests: 1,097

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@ A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSONL,
 
 ## Features at a glance
 
-- **Reads the whole human-JSON superset, no modes or flags** — strict JSON, NDJSON, JSONL, JSON5, HJSON, JSONC, plus comments, trailing commas, unquoted / single / triple / smart quotes, an implicit root object, `NaN` / `Infinity` / hex / underscores, Python & JavaScript literals, a UTF-8 BOM, mixed line endings, and any Ruby encoding (see [What it accepts](#what-it-accepts-beyond-strict-json) for the full list).
+- **Reads the whole human-JSON superset, no modes or flags** — strict JSON, NDJSON, JSONL, JSON5, HJSON, JSONC, plus comments, trailing commas, unquoted / single / triple / smart quotes, an implicit root object, `NaN` / `Infinity` / hex / underscores, Python / JavaScript / SQL literals, a UTF-8 BOM, mixed line endings, and any Ruby encoding (see [What it accepts](#what-it-accepts-beyond-strict-json) for the full list).
 - **Every document from multi-document input, in one call** — `process` returns an `Array` of all of them; `process_one` returns the single value and warns if there was more than one (never raises; routed to `on_warning`, else `Rails.logger`, else `Kernel.warn`).
 - **Streaming in bounded memory** — pass a block, or use `foreach(path_or_io)` for a composable `Enumerator` you can `.select` / `.map` / `.lazy` over.
 - **Recovers JSON from LLM / markdown noise** — strips markdown code fences, surrounding prose, and `<json>` tags, and pulls every payload out of one messy blob.
@@ -75,7 +75,8 @@ Three things set it apart:
 - Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
 - Implicit root object — a config file that starts with `key: value`, no outer `{}`
 - `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
-- UTF-8 BOM, smart/curly quotes (in keys and values), Python literals (`True` / `False` / `None`), JavaScript `undefined`
+- Leading-zero numbers (which strict JSON rejects): a token with a sign, decimal point, or exponent reads as a number (`-007.5` → `-7.5`, `007e2` → `700.0`), but a bare leading-zero integer is kept as a string (`007`, `02`) so IDs, zip codes, and account numbers don't lose their zeros
+- UTF-8 BOM, smart/curly quotes (in keys and values), Python literals (`True` / `False` / `None`), JavaScript `undefined`, case-variant null (`Null` / `NULL`, as SQL / R / PHP / YAML emit it)
 - Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
 - Duplicate keys (last value wins by default; configurable)
 

diff --git a/docs/_introduction.md b/docs/_introduction.md
@@ -29,7 +29,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
 
 ## What it accepts, beyond strict JSON
 
-Comments (`//`, `/* … */`, `#` — a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` reads as a string, not a truncated value), markdown-wrapped / chatty blobs around the payload, trailing commas, unquoted / single- / triple-quoted / quoteless strings, an implicit root object (`key: value`, no braces), `NaN` / `Infinity` / hex / underscored numbers, Python (`True` / `False` / `None`) and JavaScript (`undefined`) literals, smart quotes, a UTF-8 BOM, mixed CR / LF / CRLF line endings, any Ruby-supported input encoding (via `encoding:`), and duplicate keys. The full list — with the human-JSON spec references it's drawn from — is kept in one place: [**What it accepts, beyond strict JSON**](../README.md#what-it-accepts-beyond-strict-json) in the README.
+Comments (`//`, `/* … */`, `#` — a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` reads as a string, not a truncated value), markdown-wrapped / chatty blobs around the payload, trailing commas, unquoted / single- / triple-quoted / quoteless strings, an implicit root object (`key: value`, no braces), `NaN` / `Infinity` / hex / underscored numbers, leading-zero numbers (a signed / decimal / exponent token like `-007.5` is a number, a bare `007` is kept as a string so IDs keep their zeros), Python (`True` / `False` / `None`), JavaScript (`undefined`), and SQL / R / PHP / YAML (`Null` / `NULL`) literals, smart quotes, a UTF-8 BOM, mixed CR / LF / CRLF line endings, any Ruby-supported input encoding (via `encoding:`), and duplicate keys. The full list — with the human-JSON spec references it's drawn from — is kept in one place: [**What it accepts, beyond strict JSON**](../README.md#what-it-accepts-beyond-strict-json) in the README.
 
 It raises only on genuinely unreadable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
 

diff --git a/docs/examples.md b/docs/examples.md
@@ -145,7 +145,23 @@ JSON
 
 A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
 
-### Example 10: Wrapper Noise Around a Payload
+### Example 10: Leading-Zero IDs and SQL `NULL`
+
+```ruby
+SmarterJSON.process_one(<<~JSON)
+  {
+    user_id:    007,      # bare leading zero -> kept as a string
+    zip:        02139,    # ditto: zip codes keep their leading zero
+    balance:    -007.50,  # a sign / decimal point / exponent makes it a number
+    deleted_at: NULL      # SQL / R / YAML null spelling -> nil
+  }
+JSON
+# => {"user_id"=>"007", "zip"=>"02139", "balance"=>-7.5, "deleted_at"=>nil}
+```
+
+A bare leading-zero integer is kept as a string so identifiers, zip codes, and account numbers don't lose their zeros; a sign, decimal point, or exponent marks numeric intent (`-007.50` → `-7.5`). `Null` and `NULL` join `null` / `None` / `undefined` as spellings of `nil`; a quoted `"NULL"` stays a string.
+
+### Example 11: Wrapper Noise Around a Payload
 
 #### Fenced payload
 
@@ -197,22 +213,22 @@ TEXT
 # => [{"a"=>1}, {"b"=>2}]
 ```
 
-### Example 11: Write JSON
+### Example 12: Write JSON
 
 ```ruby
 SmarterJSON.generate({ "a" => 1, "b" => [2, 3] })   # => '{"a":1,"b":[2,3]}'
 SmarterJSON.generate([1, 2, 3])                       # => '[1,2,3]'
 ```
 
-### Example 12: Write NDJSON
+### Example 13: Write NDJSON
 
 An Array writes one element per line:
 
 ```ruby
 SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson)   # => "{\"id\":1}\n{\"id\":2}\n"
 ```
 
-### Example 13: Round-Trip Read and Write
+### Example 14: Round-Trip Read and Write
 
 ```ruby
 obj = { "a" => 1, "b" => [2, "three", nil, true] }

diff --git a/ext/smarter_json/smarter_json.c b/ext/smarter_json/smarter_json.c
@@ -641,16 +641,33 @@ static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(fj_state *st, uint64_t m10, in
  * per-byte '_' test, dropping to a slow step only when an underscore appears. */
 static int fj_try_decimal(fj_state *st, const char *p, long n, VALUE *out) {
   long i = 0;
-  int  is_float = 0, neg = 0, has_digit = 0, overflow = 0;
+  int  is_float = 0, neg = 0, has_digit = 0, overflow = 0, has_sign = 0, had_leading_zero = 0;
   uint64_t m10 = 0;
   int  m10digits = 0, frac = 0;
   int64_t e10 = 0;
 
-  if (i < n && (p[i] == '-' || p[i] == '+')) { neg = (p[i] == '-'); i++; }
+  if (i < n && (p[i] == '-' || p[i] == '+')) { has_sign = 1; neg = (p[i] == '-'); i++; }
 
-  /* Integer part: a single '0', or [1-9] then digits/underscores. */
+  /* Integer part: a single '0', or [1-9] then digits/underscores. A leading '0' followed
+   * by more digits (a leading-zero token) is consumed too but flagged: a BARE leading-zero
+   * integer (no sign / dot / exponent) is rejected below and kept as a string, so zip /
+   * account / check numbers preserve their zeros. */
   if (i < n && p[i] == '0') {
     has_digit = 1; m10digits = 1; i++;
+    /* Underscore-separated too (like the [1-9] branch below), so 0_5.0 / 0_0.5 behave
+     * exactly like 05.0 / 00.5 on both paths. */
+    if (i < n && ((p[i] >= '0' && p[i] <= '9') || p[i] == '_')) {
+      for (;;) {
+        while (i < n && p[i] >= '0' && p[i] <= '9') {
+          had_leading_zero = 1;
+          if (m10digits < 18) { m10 = m10 * 10 + (uint64_t)(p[i] - '0'); m10digits++; }
+          else overflow = 1;
+          i++;
+        }
+        if (i < n && p[i] == '_') { i++; continue; }
+        break;
+      }
+    }
   } else if (i < n && p[i] >= '1' && p[i] <= '9') {
     has_digit = 1;
     for (;;) {
@@ -699,6 +716,8 @@ static int fj_try_decimal(fj_state *st, const char *p, long n, VALUE *out) {
 
   if (i != n)     return 0;  /* token not fully consumed -> not a number (string) */
   if (!has_digit) return 0;  /* e.g. "." or "+" -> not a number (string) */
+  /* A BARE leading-zero integer (no sign / dot / exponent) is an ID, not a number. */
+  if (had_leading_zero && !has_sign && !is_float) return 0;
 
   if (!is_float) {
     *out = fj_int_from_parts(m10, m10digits, neg, overflow, p, n);
@@ -730,13 +749,13 @@ static VALUE fj_parse_number(fj_state *st) {
   const char *p   = buf + st->pos;  /* buf[len] == '\0' (RSTRING_PTR) is the scan sentinel */
   const char *np  = p;              /* token start, includes a leading sign */
   long   nlen;
-  int    is_float = 0, neg = 0, overflow = 0;
+  int    is_float = 0, neg = 0, overflow = 0, has_sign = 0, had_leading_zero = 0;
   uint64_t m10 = 0;                 /* mantissa: integer + fraction digits */
   int    m10digits = 0;             /* mantissa digit chars (caps the Eisel-Lemire fast path at 18) */
   int    frac = 0;                  /* fraction digit chars: e10 -= frac */
   int64_t e10 = 0;
 
-  if (*p == '-' || *p == '+') { neg = (*p == '-'); p++; }
+  if (*p == '-' || *p == '+') { has_sign = 1; neg = (*p == '-'); p++; }
 
   /* Cold branches (rare, not perf-critical): sync the cursor, reuse scalar helpers. */
   if (*p == 'I') { st->pos = p - buf; fj_consume_keyword(st, "Infinity"); return rb_float_new(neg ? -INFINITY : INFINITY); }
@@ -755,10 +774,27 @@ static VALUE fj_parse_number(fj_state *st) {
     return rb_str_to_inum(hx, 16, 0);
   }
 
-  /* Integer part: a single '0', or [1-9] then digits/underscores. */
+  /* Integer part: a single '0', or [1-9] then digits/underscores. A leading '0' followed
+   * by more digits is consumed but flagged; a BARE leading-zero integer (no sign / dot /
+   * exponent) is rejected after the scan — it is an ID, not a number, and has no bare
+   * top-level quoteless-string form, so it raises (matching `000001`). */
   if (*p == '0') {
     m10digits = 1;  /* one leading zero, counted as a single mantissa digit */
     p++;
+    /* Underscore-separated too (like the [1-9] branch below), so the underscore is just a
+     * separator (0_0.5 behaves like 00.5). */
+    if ((*p >= '0' && *p <= '9') || *p == '_') {
+      for (;;) {
+        while (*p >= '0' && *p <= '9') {
+          had_leading_zero = 1;
+          if (m10digits < 18) { m10 = m10 * 10 + (uint64_t)(*p - '0'); m10digits++; }
+          else overflow = 1;
+          p++;
+        }
+        if (*p == '_') { p++; continue; }
+        break;
+      }
+    }
   } else if (*p >= '1' && *p <= '9') {
     for (;;) {
       while (*p >= '0' && *p <= '9') {
@@ -811,6 +847,12 @@ static VALUE fj_parse_number(fj_state *st) {
   st->pos = p - buf;
   nlen = p - np;
 
+  /* A BARE leading-zero integer is an ID, not a number; at this top-level / strict
+   * position there is no quoteless-string form, so it raises. */
+  if (had_leading_zero && !has_sign && !is_float) {
+    fj_error(st, "invalid number with a leading zero");
+  }
+
   if (!is_float) {
     return fj_int_from_parts(m10, m10digits, neg, overflow, np, nlen);
   }
@@ -979,7 +1021,8 @@ static VALUE fj_classify_quoteless(fj_state *st, const char *p0, long n0) {
 
   if (fj_tok_eq(p, n, "true")  || fj_tok_eq(p, n, "True"))  return Qtrue;
   if (fj_tok_eq(p, n, "false") || fj_tok_eq(p, n, "False")) return Qfalse;
-  if (fj_tok_eq(p, n, "null")  || fj_tok_eq(p, n, "None") || fj_tok_eq(p, n, "undefined")) return Qnil;
+  if (fj_tok_eq(p, n, "null")  || fj_tok_eq(p, n, "Null") || fj_tok_eq(p, n, "NULL") ||
+      fj_tok_eq(p, n, "None") || fj_tok_eq(p, n, "undefined")) return Qnil;
   if (fj_tok_eq(p, n, "NaN")) return rb_float_new(NAN);
   if (fj_tok_eq(p, n, "Infinity")) return rb_float_new(INFINITY);
 
@@ -1273,8 +1316,10 @@ static VALUE fj_parse_value(fj_state *st) {
     case 'T':  return fj_parse_literal(st, "True", Qtrue);
     case 'F':  return fj_parse_literal(st, "False", Qfalse);
     case 'u':  return fj_parse_literal(st, "undefined", Qnil);
-    case 'N':  /* NaN (number) vs None (Python null) */
+    case 'N':  /* NaN (number); None / Null / NULL (null) */
       if (fj_byte_at(st, 1) == 'a') return fj_parse_number(st);
+      if (fj_byte_at(st, 1) == 'u') return fj_parse_literal(st, "Null", Qnil);
+      if (fj_byte_at(st, 1) == 'U') return fj_parse_literal(st, "NULL", Qnil);
       return fj_parse_literal(st, "None", Qnil);
     default:
       if (b == '-' || b == '+' || b == '.' || b == 'I' || (b >= '0' && b <= '9')) {

diff --git a/lib/smarter_json/parser.rb b/lib/smarter_json/parser.rb
@@ -739,7 +739,7 @@ class Parser
     # Mantissa must carry at least one digit (int part, or a leading-dot fraction), so a
     # bare exponent like "-e695881" is NOT a number — it falls through to a quoteless
     # string, matching the C path. Trailing exponent stays optional.
-    DEC_RE      = /\A[-+]?(?:(?:0|[1-9][0-9_]*)(?:\.[0-9_]*)?|\.[0-9_]+)(?:[eE][-+]?[0-9_]+)?\z/.freeze
+    DEC_RE      = /\A[-+]?(?:[0-9][0-9_]*(?:\.[0-9_]*)?|\.[0-9_]+)(?:[eE][-+]?[0-9_]+)?\z/.freeze
     # A decimal BigDecimal() would reject as-is: a leading dot (".5") or a dot not
     # followed by a digit ("5.", "5.e3"). Matches iff normalize_for_bigdecimal
     # would change the string — so when it doesn't match, we skip normalization.
@@ -1210,10 +1210,11 @@ def parse_value
 
     # Disambiguate NaN (number) from None (Python null) at a strict position.
     def parse_upper_n
-      if byte_at(1) == 0x61 # 'a' → NaN
-        parse_number
-      else
-        parse_literal_keyword("None", nil)
+      case byte_at(1)
+      when 0x61 then parse_number                       # 'a' -> NaN
+      when 0x75 then parse_literal_keyword("Null", nil) # 'u' -> Null
+      when 0x55 then parse_literal_keyword("NULL", nil) # 'U' -> NULL
+      else parse_literal_keyword("None", nil)
       end
     end
 
@@ -1378,7 +1379,7 @@ def classify_quoteless(str)
       case str
       when "true", "True"          then return true
       when "false", "False"        then return false
-      when "null", "None"          then return nil
+      when "null", "Null", "NULL", "None" then return nil
       when "undefined"             then return nil
       when "NaN"                   then return Float::NAN
       when "Infinity", "+Infinity" then return Float::INFINITY
@@ -1405,7 +1406,15 @@ def numeric_value(str)
       # number tokens that is a real per-value allocation. Underscores are rare, so only
       # pay it when the token actually contains one (measured +27% on long-token decimals).
       body = str.include?("_") ? str.delete("_") : str
-      body.match?(/[.eE]/) ? decimal_value(body) : body.to_i
+      return decimal_value(body) if body.match?(/[.eE]/)
+
+      # A BARE leading-zero integer (no sign / dot / exponent) is an ID — a zip code,
+      # account number, phone number — not a number; keep it a string so the zeros survive.
+      # A sign (+007 / -007) signals numeric intent (IDs never carry a sign), so those parse.
+      c0 = body.getbyte(0)
+      return NOT_NUMERIC if c0 == ZERO && body.bytesize > 1
+
+      body.to_i
     end
 
     # True when the token starts with [+-]?0[xX] — the only shape HEX_RE can match.
@@ -1663,10 +1672,13 @@ def decode_unicode_escape(i)
 
     def parse_number
       negative = false
+      signed = false
       if byte == MINUS
         negative = true
+        signed = true
         advance(1)
       elsif byte == PLUS
+        signed = true
         advance(1)
       end
 
@@ -1680,6 +1692,7 @@ def parse_number
       end
 
       int_start = @pos
+      had_leading_zero = false
 
       if byte == ZERO
         advance(1)
@@ -1692,6 +1705,16 @@ def parse_number
           value = @input.byteslice(hex_start, @pos - hex_start).delete("_").to_i(16)
           return negative ? -value : value
         end
+        # A run of further digits after the single leading '0' (007, 00023, or the
+        # underscore-separated 0_0) — consume it and flag the leading zero; the reject check
+        # below turns a bare leading-zero integer into an error. The underscore is only a
+        # separator, so 0_0.5 behaves like 00.5.
+        if (b = byte) && ((b >= ZERO && b <= NINE) || b == UNDERSCORE)
+          while (b = byte) && ((b >= ZERO && b <= NINE) || b == UNDERSCORE)
+            had_leading_zero = true if b >= ZERO && b <= NINE
+            advance(1)
+          end
+        end
       elsif byte && byte >= 0x31 && byte <= NINE
         advance(1) while (b = byte) && ((b >= ZERO && b <= NINE) || b == UNDERSCORE)
       elsif byte == DOT
@@ -1717,6 +1740,13 @@ def parse_number
         advance(1) while (b = byte) && ((b >= ZERO && b <= NINE) || b == UNDERSCORE)
       end
 
+      # A BARE leading-zero integer is an ID, not a number; at this top-level / strict
+      # position there is no quoteless-string form, so it raises (a sign or a dot/exponent
+      # signals numeric intent and is allowed: +007 -> 7, -000023.5 -> -23.5, 007e2 -> 700.0).
+      if had_leading_zero && !signed && !is_float
+        raise error("invalid number with a leading zero")
+      end
+
       slice = @input.byteslice(int_start, @pos - int_start).delete("_")
       value = is_float ? decimal_value(slice) : slice.to_i
       negative ? -value : value

diff --git a/lib/smarter_json/version.rb b/lib/smarter_json/version.rb
@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 
 module SmarterJSON
-  VERSION = "1.1.2"
+  VERSION = "1.2.0"
 end