Skip to content

RFC 8259: grammar-aware structural validation pass (close gaps from #37) #39

@membphis

Description

@membphis

PR #38 lands the RFC 8259 validation audit but defers three structural-grammar checks that require state machine awareness beyond the current heuristic in `validate_eager_values`.

Concrete gaps

`tests/rfc8259_compliance.rs` (3 `#[ignore]`'d tests)

  • `structural::missing_colon` — `{"a"}`: object has a key but no `:value` pair. The scanner emits `{`, `"`, `"`, `}` and the heuristic cannot detect the missing `:` from byte-context alone.
  • `structural::leading_comma_array_with_value` — `[,1]`: gap between `[` and `,` is empty but `prev_structural=[`, so the "empty gap after `:`/`,`" rule doesn't fire.
  • `structural::missing_comma_in_object` — `{"a":1"b":2}`: gap between close-quote and open-quote has `prev_structural='"'`, not a value-separator context.

`tests/json_test_suite.rs` `KNOWN_N_FAILURES` (13 files)

All require the same grammar-aware walk:

  • Non-string object keys (`n_object_lone_continuation_byte_in_key_and_trailing_comma`, etc.)
  • Colon vs comma confusion in object key-value pairs
  • Missing commas between array elements / object entries

See the `KNOWN_N_FAILURES` array in `tests/json_test_suite.rs` for the exact file list with per-entry rationale.

Proposed approach

Replace the current `validate_scalars_in_gaps` heuristic with a small grammar state machine that walks `indices` tracking the expected next token kind:

  • Inside object: `{` → expect key (string) → expect `:` → expect value → expect `,` or `}` → loop
  • Inside array: `[` → expect value → expect `,` or `]` → loop
  • Top level: expect single value

The state machine validates each structural transition against the expected kind. Estimated ~80-120 lines in `src/validate/mod.rs`. SIMD/scalar scanners untouched.

Acceptance criteria

  • All 3 currently-`#[ignore]`'d tests in `tests/rfc8259_compliance.rs` pass without the ignore attribute.
  • All 13 files removed from `KNOWN_N_FAILURES` in `tests/json_test_suite.rs`; full `n_*` corpus (188 files) rejects in eager mode.
  • No regression on existing tests (default features, scalar-only, test-panic feature, Lua busted).
  • `y_*` corpus still 95/95.

Out of scope

  • Lazy mode behavior is preserved as-is (structural-only).
  • Performance optimization of the grammar walk (correctness first).
  • Forward-compat `_reserved` slots on `qjd_options` (separate concern if needed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions