Skip to content

Enable the composite decoder in surface-code example. Depends on #536#619

Draft
melody-ren wants to merge 29 commits into
NVIDIA:mainfrom
melody-ren:melodyr/pr536-host-call-composite
Draft

Enable the composite decoder in surface-code example. Depends on #536#619
melody-ren wants to merge 29 commits into
NVIDIA:mainfrom
melody-ren:melodyr/pr536-host-call-composite

Conversation

@melody-ren

Copy link
Copy Markdown
Collaborator

Summary

This wires the TRT + PyMatching composite decoder into the merged HOST_CALL realtime path and adds coverage for both the direct composite RPC path and the surface-code example.

  • Add a TrtDecoderHostCallRpc unit test that builds the composite decoder through the standard decoder config path, routes it through the in-process HOST_CALL session, and validates observable corrections.
  • Extend surface_code-1 to support trt_decoder configs with a TensorRT predecoder and nested PyMatching global decoder.
  • Use the Z-sector DEM for the TRT/PyMatching surface-code path because the full X+Z DEM generated by this example is not graphlike for PyMatching.
  • Add a small ONNX generator for split predecoder outputs [pre_L | residual].
  • Add TRT surface-code test variants, including inproc_rpc, and make CI install onnx/pyyaml before configure so the tests are actually registered.
  • Preserve readable syndrome dumps with ROUND_START markers while truncating packed syndrome bits to the real per-round syndrome width.

Merge sequence

Depends on #536 . Should go in after 536.

Testing

Ran in cudaqx-public-pr615-cu13-dev:

cmake --build /tmp/cudaqx-public-pr615-pr536-trt-build-mainrtlibs --target surface_code-1-local
ctest --output-on-failure -R "TrtDecoderHostCallRpc|app_examples.surface_code-1-local-test-distance-3-trt"
ctest --output-on-failure -R "app_examples.surface_code-1-local-test-distance-3($|-trt)"

bmhowe23 and others added 29 commits April 29, 2026 23:57
…output

Add a "predecoder" execution mode to the TensorRT decoder so it can be
chained with a second decoder (e.g. PyMatching) and return logical-frame
observables directly. The TRT model is assumed to emit a single output
that concatenates [pre_L (num_observables entries), residual_dets (rest)].

New constructor parameters:
- "batch_size": required when the ONNX model has a dynamic batch dim.
  Used to size the optimization profile and pre-allocate I/O buffers.
- "global_decoder" + "global_decoder_params": optional decoder name and
  params for a follow-up decoder run on the residual_dets portion of
  the TRT output. Created with the same H passed to trt_decoder.
- "O": observables matrix (num_observables x block_size). Enables
  decode()/decode_batch() to return the predicted logical frame.
  Number of observables is inferred from O.shape()[0].

Decode behavior matrix:
- no global_decoder, no O   -> raw TRT output (unchanged).
- no global_decoder, O      -> return the pre_L prefix only.
- global_decoder, no O      -> entire output -> global_decoder.result.
- global_decoder, O         -> residual -> global_decoder; return
                               pre_L XOR global_decoder.logical_frame.

Constructor validation when O is set:
- output_size_per_sample >= num_observables, and
- when global_decoder_ is set,
  output_size_per_sample == num_observables + global_decoder.syndrome_size.

Other changes:
- Dynamic batch support: setInputShape per call when the model's batch
  dim is -1; ONNX builder now installs a min/opt/max optimization
  profile when "batch_size" is provided.
- Split decode_batch into a typed decode_batch_impl<float|uint8_t> for
  cleaner dtype dispatch (engine I/O dtypes float32 / uint8 unchanged).
- Better INFO logging: total non-zero input vs residual detector counts
  per batch to help diagnose predecoder behavior.

Signed-off-by: Ben Howe <bhowe@nvidia.com>
Add a realtime test/demo that initializes the TensorRT decoder from an ONNX
predecoder model with PyMatching configured as the global decoder. The driver
loads detector, observable, parity-check, observable, and prior data from the
Stim export bundle, decodes samples through the composite TRT+PyMatching path,
and reports latency, throughput, correctness, and residual-syndrome diagnostics.

Register the new test_trt_decoder_composite target when TensorRT, realtime,
and the TRT decoder plugin are available.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add YAML/config support for TRT decoder runtime options including batch size,
CUDA graph execution, global decoder selection, and PyMatching-specific global
decoder parameters. Wire realtime decoder construction so TRT configs receive
the top-level observable matrix from O_sparse, and pass the same O matrix into
PyMatching global decoder params for composite observable decoding.

Expose the new config fields through Python bindings and heterogeneous_map
round-tripping. Extend YAML tests for TRT config round-trip, runtime parameter
conversion, and O_sparse-to-O injection.

Update test_trt_decoder_composite to support an optional --config-yaml path,
allowing the existing composite demo to construct and run a real TRT+PyMatching
decoder directly from YAML while preserving the original manual CLI path.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/unittests/realtime/CMakeLists.txt
#	libs/qec/unittests/realtime/test_trt_decoder_composite.cpp
Replace the TRT decoder's hardcoded optional PyMatching global decoder params
with a tagged global_decoder_config variant. Preserve PyMatching as the current
supported concrete config while using std::monostate for the unset case.

Update heterogeneous-map conversion, YAML mapping, and Python bindings so the
existing PyMatching YAML/Python surface continues to round-trip. Extend the YAML
unit test to verify the PyMatching variant arm is selected and still produces
the expected runtime parameter map.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/python/bindings/py_decoding_config.cpp
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
The trt_decoder constructed with an "O" observable matrix projects to
observables internally, so it must report decode_result_type::decode_to_obs
to enqueue_syndrome(). Set the result type where decode_to_observables_ is
enabled, and assert it in the composite test.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
A monostate global_decoder_params (no global decoder attached) was being
mutated into a default pymatching_decoder_config across a serialize ->
deserialize cycle, through two independent serialization layers:

1. heterogeneous_map: to_heterogeneous_map() emitted an empty
   global_decoder_params map whenever global_decoder was set but the
   params were monostate, which read back as a pymatching config.

2. YAML MappingTraits (the path used by to_yaml_str/from_yaml_str, and
   thus by save_dem/load_dem): mapOptional emitted an empty
   'global_decoder_params: {}' for the monostate case, which read back
   into a default pymatching config.

Both layers now emit nothing for monostate. Any runtime need for an empty
params map is handled in prepare_decoder_params (realtime_decoding.cpp),
not in serialization. The heterogeneous_map path also rejects a params
map that carries global_decoder_params without a global_decoder.

Add regression tests: monostate round-trips unchanged through both YAML
and heterogeneous_map and emits no params key; params-without-decoder
throws.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Two follow-ups to the monostate round-trip fix:

1. prepare_decoder_params now synthesizes an empty global_decoder_params
   map for a pymatching global decoder before the O_sparse early return.
   Since serialization stopped emitting an empty params map for monostate,
   a global decoder configured to run on residual detectors without an O
   matrix was no longer attached by the plugin (which requires both
   global_decoder and global_decoder_params keys). This is a documented,
   valid configuration, so restore it in runtime prep where it belongs.

2. trt_decoder_config::to_heterogeneous_map now throws when
   global_decoder_params is set but global_decoder is not, matching the
   rejection already enforced by from_heterogeneous_map.

Add regression tests for both.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Resolve conflicts by keeping both PyMatching config structs for now:
- pymatching_decoder_config (trt nested global decoder, PR536) and
  Vedika's pymatching_config (standalone realtime decoder, NVIDIA#614) coexist.
- Kept the global_decoder_config variant + its serialization + YAML traits.
- realtime_decoding.cpp: unioned includes; kept both prepare_decoder_params
  and Chuck's new get_realtime_session (NVIDIA#609).

Follow-up: unify on pymatching_config (drop pymatching_decoder_config).
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@melody-ren

Copy link
Copy Markdown
Collaborator Author

/ok to test 732ef4c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants