Problem
The scalable pipeline SWIFT path appears to use exact-error-driven query selection.
In dpsynth/pipeline_transformations/swift.py, the pipeline computes exact candidate marginals, converts them into errors with marginals_computations.compute_errors(...), requests a budget named Swift Select Queries, and then passes the errors into swift.select_queries(...).
However, the selection scores do not appear to be noised before swift.select_queries(...); noise is added only later to the selected marginal measurements.
Why this matters
The selected clique tree / selected workload is itself data-dependent output. If selection is driven by exact marginal errors, the later noisy measurement step does not protect the information leaked by which queries were selected.
This is separate from the local discrete_mechanisms.swift path, which has its own score-noising logic. The issue here is the scalable pipeline transformation path.
Local evidence
Reviewed at commit 18c2c951bd2923f889f6e3b2b757e01aaae398ee.
Relevant lines in the current tree:
dpsynth/pipeline_transformations/swift.py: exact_marginals = marginals_computations.compute_exact_marginals(...)
dpsynth/pipeline_transformations/swift.py: errors = marginals_computations.compute_errors(...)
dpsynth/pipeline_transformations/swift.py: budget request named Swift Select Queries
dpsynth/pipeline_transformations/swift.py: return swift.select_queries(errors_dict, ...)
dpsynth/pipeline_transformations/swift.py: noise is added at the later Add noise to selected marginals stage
dpsynth/pipeline_transformations/marginals_computations.py: compute_errors(...) uses exact_vals from exact marginals
Possible fix
Account separately for selection and measurement. Add DP noise to the vector of SWIFT candidate error scores before clique-tree/query selection, and use the remaining measurement budget only for selected marginal measurement. Diagnostic output should avoid publishing exact errors.
Draft PR
I opened a draft fix here: #31
Problem
The scalable pipeline SWIFT path appears to use exact-error-driven query selection.
In
dpsynth/pipeline_transformations/swift.py, the pipeline computes exact candidate marginals, converts them into errors withmarginals_computations.compute_errors(...), requests a budget namedSwift Select Queries, and then passes the errors intoswift.select_queries(...).However, the selection scores do not appear to be noised before
swift.select_queries(...); noise is added only later to the selected marginal measurements.Why this matters
The selected clique tree / selected workload is itself data-dependent output. If selection is driven by exact marginal errors, the later noisy measurement step does not protect the information leaked by which queries were selected.
This is separate from the local
discrete_mechanisms.swiftpath, which has its own score-noising logic. The issue here is the scalable pipeline transformation path.Local evidence
Reviewed at commit
18c2c951bd2923f889f6e3b2b757e01aaae398ee.Relevant lines in the current tree:
dpsynth/pipeline_transformations/swift.py:exact_marginals = marginals_computations.compute_exact_marginals(...)dpsynth/pipeline_transformations/swift.py:errors = marginals_computations.compute_errors(...)dpsynth/pipeline_transformations/swift.py: budget request namedSwift Select Queriesdpsynth/pipeline_transformations/swift.py:return swift.select_queries(errors_dict, ...)dpsynth/pipeline_transformations/swift.py: noise is added at the laterAdd noise to selected marginalsstagedpsynth/pipeline_transformations/marginals_computations.py:compute_errors(...)usesexact_valsfrom exact marginalsPossible fix
Account separately for selection and measurement. Add DP noise to the vector of SWIFT candidate error scores before clique-tree/query selection, and use the remaining measurement budget only for selected marginal measurement. Diagnostic output should avoid publishing exact errors.
Draft PR
I opened a draft fix here: #31