Skip to content

Bind speculative base by state_id chain instead of byte equality#146

Draft
ikehara wants to merge 1 commit into
feature/explicit-state-parallelizationfrom
fix/speculative-base-binding-state-id
Draft

Bind speculative base by state_id chain instead of byte equality#146
ikehara wants to merge 1 commit into
feature/explicit-state-parallelizationfrom
fix/speculative-base-binding-state-id

Conversation

@ikehara

@ikehara ikehara commented Jun 12, 2026

Copy link
Copy Markdown

Problem

verify_expected_base_state_in_tx byte-compares the supplied speculative base client_state/consensus_state Anys against the canonical store. This comparison is structurally unsatisfiable whenever the relayer rebuilds the base instead of echoing back queried bytes:

  • create_client stores the submitter's Any encoding verbatim, but update_client stores the enclave light client's re-encoded form. For ELCs that embed JSON config in the client state, decode/encode is not byte-stable (e.g. an empty chain_info_json decodes to a default struct and re-encodes as a non-empty JSON object), so the stored representation changes after the first update.
  • The relayer rebuilds intermediate bases (second and later batches of an initial sync, and the on-chain committed base path) from its own config/RPC data with its own protobuf encoder. Those bytes can never be guaranteed to match the enclave's re-encoded form.

Observed effect: an initial sync's first batch (queried base, verbatim round-trip) commits, then the second batch deterministically fails with stored speculative base client_state mismatch at exactly base_height + max_units * blocks_per_chunk, and retries stay stuck at the same height.

This is the same class of issue as the earlier removal of the raw-Any state_id recompute (#145): the host cannot reproduce ELC-specific encodings, so byte-level invariants across encoders do not hold.

Fix

  • Drop the client_state/consensus_state byte comparisons. The base is bound by content via the existing encoding-independent check: the enclave-observed prev_state_id (computed in-enclave over the supplied base) must match the height-indexed state_id recorded by a previous create/serial/speculative update.
  • The latest-client_state byte check also served as a stale-base guard (an old, historically valid base must not rewind the canonical cache). Preserve that property with a host-managed speculative-commit high-water mark (clients/<id>/speculativeCommitHeight): bases below the mark are rejected with a distinct stale speculative base height error, and the mark monotonically advances to the batch's last post_height on every successful apply.

Notes

  • Binding scope is documented in validation.rs: base fields erased by ELC canonicalization (e.g. latest_height) are not compared; a divergent base from the authenticated relayer cannot affect the on-chain proof chain and at worst corrupts this client's host-store cache, which a serial update_client rewrites.
  • The high-water mark only tracks speculative commits, so serial updates that advance the client further do not cause false rejections (the check is prev_height >= mark).
  • Relayer-side (lcp-go/prover) needs no change.

Tests

  • New: stitch_accepts_first_base_state_when_stored_bytes_use_different_encoding (regression for this failure: stored bytes differ from supplied base, state_id matches → accepted; high-water mark recorded).
  • New: stitch_rejects_stale_first_base_below_speculative_commit_height (replaces the byte-equality-based stitch_rejects_first_base_state_when_canonical_client_state_advanced).
  • Removed: stitch_rejects_first_base_state_that_is_not_in_store (its scenarios are covered by the prev_state_id-missing and stored-state_id-missing tests).
  • cargo test -p service (42 passed), cargo test -p enclave-api -p lcp-types, fmt/clippy clean.

🤖 Generated with Claude Code

The canonical store's Any encoding is not stable across writers:
create_client stores the submitter's encoding verbatim while
update_client stores the enclave light client's re-encoded form
(e.g. JSON-embedded config fields re-serialized). A base rebuilt by
the relayer therefore cannot reproduce the stored bytes in general,
so the stored/supplied byte comparisons deterministically rejected
every batch whose base was not a verbatim round-trip of the store
(first hit at the second batch of an initial sync, where the base is
rebuilt rather than queried).

Replace the client_state/consensus_state byte checks with the
encoding-independent binding that already closes the chain: the
enclave-observed prev_state_id must match the state_id recorded at
prev_height. Keep the stale-base protection the latest-client_state
byte check provided by tracking a host-managed speculative-commit
high-water mark and rejecting bases below it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant