Skip to content

Pipeline SWIFT query selection appears to use exact marginal errors #35

@vvv214

Description

@vvv214

Problem

The scalable pipeline SWIFT path appears to use exact-error-driven query selection.

In dpsynth/pipeline_transformations/swift.py, the pipeline computes exact candidate marginals, converts them into errors with marginals_computations.compute_errors(...), requests a budget named Swift Select Queries, and then passes the errors into swift.select_queries(...).

However, the selection scores do not appear to be noised before swift.select_queries(...); noise is added only later to the selected marginal measurements.

Why this matters

The selected clique tree / selected workload is itself data-dependent output. If selection is driven by exact marginal errors, the later noisy measurement step does not protect the information leaked by which queries were selected.

This is separate from the local discrete_mechanisms.swift path, which has its own score-noising logic. The issue here is the scalable pipeline transformation path.

Local evidence

Reviewed at commit 18c2c951bd2923f889f6e3b2b757e01aaae398ee.

Relevant lines in the current tree:

  • dpsynth/pipeline_transformations/swift.py: exact_marginals = marginals_computations.compute_exact_marginals(...)
  • dpsynth/pipeline_transformations/swift.py: errors = marginals_computations.compute_errors(...)
  • dpsynth/pipeline_transformations/swift.py: budget request named Swift Select Queries
  • dpsynth/pipeline_transformations/swift.py: return swift.select_queries(errors_dict, ...)
  • dpsynth/pipeline_transformations/swift.py: noise is added at the later Add noise to selected marginals stage
  • dpsynth/pipeline_transformations/marginals_computations.py: compute_errors(...) uses exact_vals from exact marginals

Possible fix

Account separately for selection and measurement. Add DP noise to the vector of SWIFT candidate error scores before clique-tree/query selection, and use the remaining measurement budget only for selected marginal measurement. Diagnostic output should avoid publishing exact errors.

Draft PR

I opened a draft fix here: #31

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions