`canonicalize_for_rounds` misrepresents the error model: arbitrary observable on same-syndrome merges, zero-syndrome logical errors dropped, order-dependent rates from min-id retention

# Summary

`detector_error_model::canonicalize_for_rounds` (`libs/qec/lib/detector_error_model.cpp:63-159`), which `dem_from_memory_circuit` applies internally to every DEM it returns (`libs/qec/lib/experiments.cpp:375`; the `z_dem` / `x_dem` outputs are canonicalized too) and which the realtime application examples call directly, has three independent defects. Each is demonstrated below on a hand-built DEM small enough to check by hand, with no circuit or noise model involved. Their combined effect is then quantified on the documented decoder workflow: at p = 0.01 on a d = 3 surface-code z-basis memory DEM (built-in `cudaq.Depolarization2`, the same workflow as the sliding-window decoder documentation example), the returned DEM is total-variation distance 5.2e-2 from the exact model of the circuit it represents, while a faithful canonicalization of the same enumerated mechanisms achieves 5.5e-4.

1. **Observable flips merged arbitrarily.** Columns with the same detector signature but different observable flips are merged anyway; the retained column keeps its own observable row and the conflict produces only an info-level log (`detector_error_model.cpp:127-143`). Probability mass that does not flip the observable is relabeled as flipping it, or vice versa.
2. **Zero-syndrome logical errors dropped.** Columns with no detector signature are skipped outright, including those that flip an observable ("If the column has no non-zero elements, or a weight of 0, then we skip it", `:92-95`). These are exactly the silent logical errors; the model's observable-flip mass is underestimated by their full weight. Undetectable does not mean ignorable: the DEM is also the model from which logical error rates are predicted, which is why stim retains such mechanisms in its DEMs (`error(p) L0`).
3. **Order-dependent composition from min-id retention.** A merged column keeps the smaller error id ("Arbitrarily choose to keep the smaller error ID.", `:123-126`), and later merge decisions dispatch on that id: additive for the same id, on the grounds that "If the errors originate from the same error mechanism, then P(A and B) = 0", XOR otherwise (`:113-121`). After one cross-id merge the kept id labels a column that contains other ids' probability mass, so a later same-id merge composes additively when the contents are not mutually exclusive. The composed rate becomes order-dependent: the same three mechanisms below give 0.36 or 0.32 depending on input column order, and 0.32 is the correct value. The exported `error_ids` also no longer satisfy their documented invariant, "For all errors with the *same* ID, only one of them can happen" (`libs/qec/python/bindings/py_decoder.cpp:221-226`).

# Reproduction

```python
"""Three defects in detector_error_model.canonicalize_for_rounds, each shown
on a hand-built DEM small enough to check by hand. No circuit, no noise
model; only the canonicalization function itself.

Run: python3 repro.py   (cudaq-qec 0.6.0)
"""

import numpy as np
import cudaq_qec as qec


def make_dem(columns, num_detectors=1, num_obs=1):
    """columns: list of (detector_rows, obs_rows, rate, error_id)."""
    n = len(columns)
    dem = qec.DetectorErrorModel()
    H = np.zeros((num_detectors, n), dtype=np.uint8)
    O = np.zeros((num_obs, n), dtype=np.uint8)
    rates = []
    ids = []
    for c, (drows, orows, rate, eid) in enumerate(columns):
        for r in drows:
            H[r, c] = 1
        for r in orows:
            O[r, c] = 1
        rates.append(rate)
        ids.append(eid)
    dem.detector_error_matrix = H
    dem.observables_flips_matrix = O
    dem.error_rates = rates
    dem.error_ids = ids
    return dem


def show(dem, label):
    H = np.array(dem.detector_error_matrix)
    O = np.array(dem.observables_flips_matrix)
    print(f"{label}: {H.shape[1]} column(s)")
    for c in range(H.shape[1]):
        print(f"  col {c}: detectors {list(np.flatnonzero(H[:, c]))} "
              f"obs {list(np.flatnonzero(O[:, c]))} "
              f"rate {dem.error_rates[c]:.6f} id {dem.error_ids[c]}")


print("=" * 72)
print("Defect 1: same-syndrome columns with DIFFERENT observable flips are")
print("merged anyway; the kept observable row is whichever column came")
print("first. Input: col0 flips the observable (rate 0.2), col1 does not")
print("(rate 0.3), same single-detector syndrome.")
dem = make_dem([({0}, {0}, 0.2, 0), ({0}, set(), 0.3, 1)])
dem.canonicalize_for_rounds(1)
show(dem, "after canonicalize")
print("declared P(observable flip) of the input model: 0.2")
O = np.array(dem.observables_flips_matrix)
pflip = sum(dem.error_rates[c] for c in range(O.shape[1]) if O[0, c])
print(f"P(observable flip) of the output model: {pflip:.6f}")
print("(0.2 expected; neither 0.38 nor 0.0 is a faithful merge)")

print()
print("=" * 72)
print("Defect 2: a zero-syndrome column that flips the observable is")
print("dropped entirely. Input: col0 normal (rate 0.1), col1 has no")
print("detector signature but flips the observable (rate 0.01).")
dem = make_dem([({0}, set(), 0.1, 0), (set(), {0}, 0.01, 1)])
dem.canonicalize_for_rounds(1)
show(dem, "after canonicalize")
O = np.array(dem.observables_flips_matrix)
pflip = sum(dem.error_rates[c] for c in range(O.shape[1]) if O[0, c])
print(f"P(observable flip) of the output model: {pflip:.6f} "
      f"(input model: 0.01)")
print("contrast, stim keeps such mechanisms:")
import stim
c = stim.Circuit("X_ERROR(0.01) 0\nM 0\nOBSERVABLE_INCLUDE(0) rec[-1]")
print(f"  stim DEM of an observable-only error: "
      f"{str(c.detector_error_model()).strip()}")

print()
print("=" * 72)
print("Defect 3: merged columns keep min(error_id), and later merge")
print("decisions dispatch on that id, so the composed rate depends on")
print("column order. Input: three columns with one syndrome; A (rate 0.1,")
print("id 0) and C (rate 0.1, id 0) are exclusive alternatives of one")
print("mechanism, B (rate 0.2, id 1) is independent of them.")
print("Correct composed rate: P(exactly-odd flips) = "
      "(0.1+0.1)(1-0.2) + (1-0.1-0.1)(0.2) = 0.32")
for order, label in (([("A", 0.1, 0), ("B", 0.2, 1), ("C", 0.1, 0)],
                      "column order A,B,C"),
                     ([("A", 0.1, 0), ("C", 0.1, 0), ("B", 0.2, 1)],
                      "column order A,C,B")):
    dem = make_dem([({0}, set(), r, i) for (_, r, i) in order])
    dem.canonicalize_for_rounds(1)
    print(f"{label}: composed rate {dem.error_rates[0]:.6f}, "
          f"kept id {dem.error_ids[0]}")
print("same three mechanisms, two answers; A,B,C path: (0.1 xor 0.2) keeps")
print("id 0, then C with id 0 is added as if exclusive with the whole")
print("aggregate, giving 0.26 + 0.1 = 0.36. The exported id 0 also now")
print("labels a column containing id-1 mass, so the documented invariant")
print("(same id implies only one can happen) no longer holds downstream.")
```

# Observed

```
========================================================================
Defect 1: same-syndrome columns with DIFFERENT observable flips are
merged anyway; the kept observable row is whichever column came
first. Input: col0 flips the observable (rate 0.2), col1 does not
(rate 0.3), same single-detector syndrome.
after canonicalize: 1 column(s)
  col 0: detectors [np.int64(0)] obs [np.int64(0)] rate 0.380000 id 0
declared P(observable flip) of the input model: 0.2
P(observable flip) of the output model: 0.380000
(0.2 expected; neither 0.38 nor 0.0 is a faithful merge)

========================================================================
Defect 2: a zero-syndrome column that flips the observable is
dropped entirely. Input: col0 normal (rate 0.1), col1 has no
detector signature but flips the observable (rate 0.01).
after canonicalize: 1 column(s)
  col 0: detectors [np.int64(0)] obs [] rate 0.100000 id 0
P(observable flip) of the output model: 0.000000 (input model: 0.01)
contrast, stim keeps such mechanisms:
  stim DEM of an observable-only error: error(0.01000000000000000021) L0

========================================================================
Defect 3: merged columns keep min(error_id), and later merge
decisions dispatch on that id, so the composed rate depends on
column order. Input: three columns with one syndrome; A (rate 0.1,
id 0) and C (rate 0.1, id 0) are exclusive alternatives of one
mechanism, B (rate 0.2, id 1) is independent of them.
Correct composed rate: P(exactly-odd flips) = (0.1+0.1)(1-0.2) + (1-0.1-0.1)(0.2) = 0.32
column order A,B,C: composed rate 0.360000, kept id 0
column order A,C,B: composed rate 0.320000, kept id 0
same three mechanisms, two answers; A,B,C path: (0.1 xor 0.2) keeps
id 0, then C with id 0 is added as if exclusive with the whole
aggregate, giving 0.26 + 0.1 = 0.36. The exported id 0 also now
labels a column containing id-1 mass, so the documented invariant
(same id implies only one can happen) no longer holds downstream.
```

# Expected

A canonicalization pass should preserve the model it compresses: merge only columns identical in detector rows AND observable rows; keep zero-syndrome columns that flip an observable; compose rates in an order-independent way that respects the exclusivity structure, and export `error_ids` that still mean what the docstring says.

# Quantified impact on the documented workflow

To measure what these defects cost end to end, I enumerated all 72 `Depolarization2` instances x 15 Pauli-pair cases of the d = 3, 3-round, prep0 z-basis memory circuit independently (circuit structure transcribed from `surface_code_device.cpp` and `experiments.cpp`), propagated each case to its (detector rows, observable) signature with frame simulation, and validated the reconstruction against the returned DEM: the 43 enumerated symptom classes map one-to-one onto the DEM's 35 columns under exactly the merge and drop behaviors above. The exact joint symptom distributions (XOR-convolution over the 2^13 signature space) then give, at p = 0.01:

- returned DEM (35 columns, defects included): total variation 5.2e-2 from the true model of the circuit;
- faithful canonicalization of the same 43 classes (no observable merging, nothing dropped): 5.5e-4. This residual is not a defect of the canonicalization; it is the irreducible cost of reading gathered exclusive cases as independent mechanisms, and it concerns the consumer-side semantics of `error_rates` and `error_ids` rather than this function. I will quantify that separately;
- reproducing only the zero-syndrome drop: 7.2e-3. The dropped class carries 8.0e-3 of probability mass (a logical flip with no detector signature in this DEM's convention), so the returned DEM underreports the logical error rate of the modeled experiment by about 0.8 percentage points at these parameters.

So almost two orders of magnitude of model fidelity are lost in canonicalization, with the observable merging (defect 1) the dominant contributor and the dropped logical-error class (defect 2) second. Defect 3 perturbs individual composed rates at the few-permille level in this DEM, but it makes the composed rates depend on internal column order and corrupts the exported `error_ids` for any downstream consumer.

# Suggested fix

1. Include the observable rows in the merge key, so only columns identical in (detectors, observables) merge. This is also stim's semantics: error instructions with different `L` targets are distinct mechanisms.
2. Skip only columns with neither detectors nor observables; keep zero-syndrome observable-flipping columns, as stim does.
3. Compose in two passes: first merge same-id columns (additive, genuinely exclusive), then merge across ids (XOR). This makes the result order-independent and gives the correct 0.32 in the example above. Give cross-id merged columns fresh unique ids so the documented invariant survives, or document that canonicalized DEMs carry no exclusivity semantics.

# Environment

cudaq-qec 0.6.0 (PyPI wheel cudaq-qec-cu13 0.6.0; sources at cudaqx 84d18ca), CUDA-Q 0.14.2 (cuda-quantum-cu13 wheel), Python 3.12, Linux x86-64 (WSL2), CPU only. Defects 1-3 reproduce with no circuit, noise model, or simulator (pure `DetectorErrorModel` manipulation); the workflow quantification used the stim target. Output deterministic across runs. File and line references are to cudaqx 84d18ca. Related but independent: #606 concerns the MSM enumeration of custom two-qubit channels; everything here uses the built-in `cudaq.Depolarization2` and reproduces regardless of #606.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`canonicalize_for_rounds` misrepresents the error model: arbitrary observable on same-syndrome merges, zero-syndrome logical errors dropped, order-dependent rates from min-id retention #607

Summary

Reproduction

Observed

Expected

Quantified impact on the documented workflow

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

canonicalize_for_rounds misrepresents the error model: arbitrary observable on same-syndrome merges, zero-syndrome logical errors dropped, order-dependent rates from min-id retention #607

Description

Summary

Reproduction

Observed

Expected

Quantified impact on the documented workflow

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`canonicalize_for_rounds` misrepresents the error model: arbitrary observable on same-syndrome merges, zero-syndrome logical errors dropped, order-dependent rates from min-id retention #607