issue #297 fixes — bundle template, compat, cli, demo, runtime by park-peter · Pull Request #298 · databrickslabs/dlt-meta

park-peter · 2026-05-19T22:27:36Z

Closes #297. Eight independent fixes; each section below is a separate commit.

What changed

1. DAB template — `variables.yml` validation warnings

Every variable in templates/dab/template/{{.bundle_name}}/resources/variables.yml.tmpl declared type: string or type: bool. The bundle CLI's variable schema only accepts type: complex, so every databricks bundle validate against a freshly-scaffolded bundle emitted 18 Warning: invalid value "string" for enum field. Valid values are [complex]. Dropped the type: line from all 18 declarations.

Also dropped the sync.exclude: block from databricks.yml.tmpl — the patterns .DS_Store, .vscode, .venv/ do not match any files in a freshly-scaffolded bundle, producing three additional "Pattern X does not match any files" warnings.

bundle-init --quickstart + bundle validate now returns Validation OK! with zero warnings.

2. `compat/dlt_meta` — runtime import deferral + pyspark/delta-spark bump

dataflow_pipeline.py and pipeline_writers.py do from pyspark import pipelines as dp, which is only available in pyspark >=4.1.0. setup.py DEV_REQUIREMENTS and the README pinned pyspark==3.5.5, which lacks pyspark.pipelines. The previous blanket try: ... except ImportError: pass in compat/dlt_meta/__init__.py then silently dropped every runtime re-export (DataflowPipeline, PipelineReaders, AppendFlowWriter, DLTSinkWriter, BronzeDataflowSpec, SilverDataflowSpec, OnboardDataflowspec), surfacing as cannot import name 'DataflowPipeline' from 'dlt_meta' with no hint at the pyspark version mismatch.

Bumped pyspark==3.5.5 → pyspark>=4.1.0 and delta-spark==3.0.0 → delta-spark>=4.0.0 (the old delta pin caps pyspark <3.6.0 and is incompatible with the bump). Split the compat re-exports: cli / install / config bind unconditionally; runtime symbols are wrapped in a narrow try/except ImportError that swaps in stubs raising a clear ImportError("requires pyspark>=4.1.0 ...") when the import fails. Expanded the sys.modules mock in tests/test_compat.py to cover pyspark, pyspark.sql.session, pyspark.sql.window, delta, and delta.tables so the file no longer requires a real install.

3. CLI — malformed `onboarding_file_path` job parameter

SDPMeta._get_onboarding_named_parameters built the named parameter as f"{cmd.uc_volume_path}/sdp_meta_conf/{cmd.onboarding_file_path}". By that point cmd.onboarding_file_path is the local absolute path (overwritten by update_ws_onboarding_paths to point at the rewritten onboarding.json), so the concatenation produced /Volumes/<cat>/<schema>/<vol>/sdp_meta_conf//Users/.../onboarding.json, which the onboarding job then failed to open. The launchers under demo/launch_*.py construct their own named_parameters dict and bypass this path, which is why no demo caught it. Use os.path.basename(cmd.onboarding_file_path) in both the UC and DBFS branches.

4. Demo — legacy `import dlt` in snapshot runner

demo/SDP_META_INTERACTIVE_DEMO.py builds the snapshot runner notebook as a string. The codebase migrated from import dlt to from pyspark import pipelines as dp (issue #274), but the inline snapshot_runner_content string still carried import dlt. The symbol was never referenced in the runner body — just stale. Dropped.

5. Demo (DAB template runner) — placeholder seed strip

demo/launch_dab_template_demo.py calls _strip_example_onboarding_entry to remove the template's seeded data_flow_id: "100" row after bundle-init. The strip was gated on scenario.name in _DELTA_SCENARIO_NAMES only. For the kafka and eventhub scenarios, the seed carries <your-kafka-host>:9092 / <your-eventhub-namespace> placeholders that the launcher's STAGE 5 sanity checks reject, so every --scenario kafka / --scenario eventhub run failed at STAGE 5 with flow data_flow_id='100' field source_details.kafka.bootstrap.servers is still the placeholder. Added _STRIP_EXAMPLE_SCENARIO_NAMES = _DELTA_SCENARIO_NAMES | {"kafka", "eventhub"} and gated the strip on the broader set. cloudfiles / cloudfiles_combined keep the seed — their placeholder-free source_path_dev validates fine.

6. Demo (DAB template runner) — duplicate sanity-check printing

stage_validate printed _sdp_meta_sanity_checks errors itself and then called bundle_validate, which prints the same list again under a different header. Compounded with #5 — every failing kafka/eventhub run dumped the same block twice. Dropped the launcher's local copy; bundle_validate owns the output.

7. Runtime — `read_silver` where-clause shadow

DataflowPipeline.read_silver had for where_clause in where_clause:, shadowing the outer list with the last clause string. Nothing downstream reads where_clause post-loop today, but the same logic was already implemented correctly in the private __apply_where_clause helper. Delegated read_silver's where-clause handling to __apply_where_clause.

8. Runtime — unused helper methods + tests

dataflow_pipeline.py defined _build_table_name, _get_source_table_info, _get_target_table_name, _create_dataframe_reader, _read_from_source, _apply_transformations — six methods whose only call sites were within the same orphan chain. read_silver / read_bronze / _get_target_table_info all kept their inline implementations. Deleted the six methods and the eight test_dataflow_pipeline.py tests that exercised them.

…exclude variables.yml.tmpl declared type: string / type: bool on every variable, but `complex` is the only valid value per the bundle CLI's enum schema, so every `databricks bundle validate` against a freshly-scaffolded bundle emitted 18 warnings. The sync.exclude block in databricks.yml.tmpl listed three patterns that no scaffolded bundle ships (.DS_Store, .vscode, .venv), producing three more 'Pattern X does not match any files' warnings. Result of `bundle-init --quickstart` + `bundle validate` is now "Validation OK!" with zero warnings, matching the behavior of the official default-python template.

park-peter · 2026-05-29T00:55:23Z

@ravi-databricks
cli.py it's a bug fix. The onboarding_file_path job parameter was being built from the full local path instead of just the filename, so databricks labs sdp-meta onboard produced an unopenable /Volumes/.../sdp_meta_conf//Users/.../onboarding.json. The demos never caught it because the launchers build that parameter themselves. Switched to os.path.basename(...).

dataflow_pipeline.py two things here. First, the inline where-clause loop in read_silver now delegates to the existing __apply_where_clause helper instead of carrying its own copy (the inline copy reused its loop variable name, shadowing the outer list). Same behavior, one implementation. This is just a clean-up. Second part, I removed six private methods (_build_table_name, _get_source_table_info, _get_target_table_name, _create_dataframe_reader, _read_from_source, _apply_transformations) plus their tests. They had no caller in any production path, the only references were calls among themselves. The real read/write paths kept their own inline logic and never touched this cluster. No behavioral change. But let me know if this is an implementation you were meaning to continue to build, I can take the removal out of PR.

…pipelines dataflow_pipeline.py and pipeline_writers.py do `from pyspark import pipelines`, introduced in pyspark 4.1.0. setup.py DEV_REQUIREMENTS and the README pinned pyspark==3.5.5, which lacks pyspark.pipelines, so `from dlt_meta import DataflowPipeline` failed silently under the previous blanket `try: ... except ImportError: pass` in compat/dlt_meta/__init__.py and surfaced as the unhelpful `cannot import name 'DataflowPipeline' from 'dlt_meta'`. Split the compat re-exports: pyspark-free symbols (cli surface, install, config) bind unconditionally; symbols whose modules transitively import pyspark are wrapped in a narrow try/except that swaps in stubs raising a clear ImportError naming pyspark>=4.1.0 as the requirement. Bumped pyspark==3.5.5 to pyspark>=4.1.0 and delta-spark==3.0.0 to delta-spark>=4.0.0 in setup.py DEV_REQUIREMENTS and the README install line. delta-spark 3.x caps pyspark<3.6.0 so the old delta pin is incompatible with the pyspark bump. Extended the pyspark mock in tests/test_compat.py to cover pyspark, pyspark.sql.session, pyspark.sql.window, delta, and delta.tables so the test file no longer requires real pyspark/delta installs.

_get_onboarding_named_parameters concatenated the full local path cmd.onboarding_file_path into the UC volume / DBFS path, producing '/Volumes/<cat>/<schema>/<vol>/sdp_meta_conf//Users/.../onboarding.json'. The onboarding job then failed to open that path. The launcher demos (launch_*.py) constructed the named_parameters dict themselves and bypassed this code, which masked the bug. Use os.path.basename(cmd.onboarding_file_path) so the parameter is '<volume>/sdp_meta_conf/onboarding.json' in both the UC and DBFS branches. Updated the existing test_get_onboarding_named_parameters assertion to match.

The codebase migrated from `import dlt` to `from pyspark import pipelines as dp` (issue databrickslabs#274 / commit cfd66fa); the inline snapshot_runner_content string in SDP_META_INTERACTIVE_DEMO.py was missed. The `dlt` symbol was never used in the runner body.

… scenarios _strip_example_onboarding_entry only fired for the `delta` scenario. The seeded `data_flow_id: "100"` row carries unedited `<your-kafka-host>:9092` / `<your-eventhub-namespace>` placeholders for the kafka and eventhub scenarios, which STAGE 5's bundle-validate sanity checks reject; every kafka/eventhub launcher run failed at STAGE 5 as a result. Added _STRIP_EXAMPLE_SCENARIO_NAMES = _DELTA_SCENARIO_NAMES | {"kafka", "eventhub"} and gated the strip on that set. cloudfiles / cloudfiles_combined keep the seed (its placeholder-free source path validates fine).

stage_validate ran _sdp_meta_sanity_checks and printed every error itself before calling bundle_validate, which then printed the same errors again under a different header. Removed the launcher's local copy and let bundle_validate own the output. Also dropped the now-unused _sdp_meta_sanity_checks import.

read_silver had an inline where-clause loop `for where_clause in where_clause:` that shadowed the outer list with the last clause string. Today nothing reads it post-loop so no functional break, but the shadow is bug-prone and the private __apply_where_clause helper already implements the same logic correctly with a clause iterator. Delegated read_silver's where-clause handling to __apply_where_clause.

_build_table_name, _get_source_table_info, _get_target_table_name, _create_dataframe_reader, _read_from_source, _apply_transformations were added but never called from any production code path. Each one's only caller was another method in the same orphan chain, and read_silver / read_bronze / _get_target_table_info etc. continued to implement the same logic inline. Removed the methods and the eight tests that exercised them.

CLAassistant · 2026-05-29T08:52:32Z

All committers have signed the CLA.

ravi-databricks self-assigned this May 28, 2026

ravi-databricks changed the base branch from feature/sdp-meta to issue_297 May 28, 2026 23:56

ravi-databricks added this to the v0.0.11 milestone May 28, 2026

park-peter added 8 commits May 29, 2026 17:40

test: isolate bundle init auth for CLI render tests

e622628

park-peter force-pushed the issue_297 branch from c9d006d to e622628 Compare May 29, 2026 08:52

ravi-databricks merged commit 0a22537 into databrickslabs:issue_297 Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue #297 fixes — bundle template, compat, cli, demo, runtime#298

issue #297 fixes — bundle template, compat, cli, demo, runtime#298
ravi-databricks merged 9 commits into
databrickslabs:issue_297from
park-peter:issue_297

park-peter commented May 19, 2026

Uh oh!

park-peter commented May 29, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

park-peter commented May 19, 2026

What changed

1. DAB template — variables.yml validation warnings

2. compat/dlt_meta — runtime import deferral + pyspark/delta-spark bump

3. CLI — malformed onboarding_file_path job parameter

4. Demo — legacy import dlt in snapshot runner

5. Demo (DAB template runner) — placeholder seed strip

6. Demo (DAB template runner) — duplicate sanity-check printing

7. Runtime — read_silver where-clause shadow

8. Runtime — unused helper methods + tests

Uh oh!

park-peter commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. DAB template — `variables.yml` validation warnings

2. `compat/dlt_meta` — runtime import deferral + pyspark/delta-spark bump

3. CLI — malformed `onboarding_file_path` job parameter

4. Demo — legacy `import dlt` in snapshot runner

7. Runtime — `read_silver` where-clause shadow

park-peter commented May 29, 2026 •

edited

Loading

CLAassistant commented May 29, 2026 •

edited

Loading