Skip to content

[RF] HS3 export: constraints serialized as product_dist factors with global observables as const parameters #22597

@kratsg

Description

@kratsg

Context: we are implementing an independent HS3 consumer (pyhs3) and validating it against quickFit NLL values on an ATLAS diHiggs (bbγγ) workspace exported via RooFit's HS3 JSON export. This is feedback on the export pattern for constraint terms — not a blocker (we reproduce RooFit's behavior), but it required reverse-engineering RooFit internals rather than reading the file.

cc @cburgard @Phmonski

What the export produces (per channel, ×14 channels, ×148 constraints)

// distribution block
{ "name": "constr__THEO_BR_Hbb", "type": "gaussian_dist",
  "x": "THEO_BR_Hbb",            // nuisance parameter
  "mean": "RNDM__THEO_BR_Hbb",   // global observable
  "sigma": 1.0 }

// channel pdf
{ "name": "_model_Run2HM_1", "type": "product_dist",
  "factors": ["_modelSB_Run2HM_1", "constr__THEO_BR_Hbb", "..."] }

// parameter_points
{ "name": "RNDM__THEO_BR_Hbb", "value": 0.0, "const": true }

Consequences for a non-RooFit consumer

  1. Constraint-ness is not marked. Nothing in the file distinguishes a constraint factor from a shape factor except naming conventions (constr__, RNDM__) and graph structure (the factor does not depend on the dataset's variables). We infer it structurally, but the likelihood counting semantics this forces are currently unspecified in HS3 — a factor independent of the data must enter the likelihood once per likelihood, not once per event nor once per channel (RooProdPdf's behavior). We have filed a matching HS3-spec issue: Specify likelihood semantics of product_dist factors that do not depend on the paired dataset's variables (constraint counting) hep-statistics-serialization-standard/hep-statistics-serialization-standard#90. Flagging here because the export pattern is what triggers the ambiguity.

  2. Global observables become parameters. RNDM__* are observations of auxiliary measurements; serializing them as const: true parameter points erases the distinction RooFit's ModelConfig maintains (GlobalObservables). A consumer cannot reconstruct which constants are global observables (e.g. to randomize them for toys) except by the RNDM__ prefix convention. The misc.ROOT_internal.ModelConfigs block only records pdf/mc names, not the global-observable set.

  3. HS3 already has a cleaner encoding. Section 2.4 of the spec allows a constraint to be its own (distribution, datum) likelihood entry with the auxiliary measurement as inline numeric data:

    "likelihoods": [{
      "distributions": ["...", "constr__THEO_BR_Hbb"],
      "data":          ["...", 0.0]
    }]

    That representation is unambiguous (counting is automatic), keeps data as data, and removes the need for RNDM__* const parameters. Could the export move constraints there — or at least record the global-observables list in a standard location?

Minor observation, likely from the original workspace construction rather than the export: the constraint Gaussians have the NP as x and the global observable as mean (pdf over the NP), whereas the auxiliary-measurement reading is a pdf over the global observable with the NP as mean. Numerically identical for a symmetric Gaussian, but the declared random variable matters for normalization/toy semantics in a strict reading.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions