bundle/dms: gate deployment history recording on a dedicated env var#5666
Draft
shreyas-goenka wants to merge 30 commits into
Draft
bundle/dms: gate deployment history recording on a dedicated env var#5666shreyas-goenka wants to merge 30 commits into
shreyas-goenka wants to merge 30 commits into
Conversation
Add server-side deployment locking and state management via the Deployment Metadata Service (DMS), gated behind DATABRICKS_BUNDLE_MANAGED_STATE=true. Key changes: - DeploymentLock interface with factory (DMS or filesystem based on env) - DMS lock: version-based locking with heartbeat, operation reporting - State read/write via ListResources/CreateOperation with per-resource state - withDeploymentLock helper extracts lock boilerplate from deploy/destroy - Temporary DMS client (libs/tmpdms) mirroring future SDK-generated code - Mock DMS server for acceptance tests - 6 acceptance tests covering deploy, destroy, plan, summary, sequential deploys, and adding resources with remote state Co-authored-by: Isaac
…troy phases Co-authored-by: Isaac
LoadStateFromDMS is a state-loading function, not a lock function. Moving it to statemgmt where it belongs alongside other state management code. Co-authored-by: Isaac
When we just created the deployment, LastVersionID is necessarily empty so we can start at version "1" directly. Co-authored-by: Isaac
Co-authored-by: Isaac
Print requests inline in output.txt and clear remaining requests at the end of each script so out.requests.txt is not generated. Also update sequential-deploys test to add/remove resources across deploys, asserting create and delete operations are captured. Co-authored-by: Isaac
…erations - Print DMS requests inline in output.txt via print_requests.py - Update sequential-deploys to test create/delete across deploys - Add protoLogs replacement to stabilize flaky telemetry timing - Regenerate out.requests.txt golden files Co-authored-by: Isaac
If CreateDeployment fails, the workspace should not contain a dangling deployment ID pointing to a non-existent server record. Co-authored-by: Isaac
The old lock.Acquire mutator checked for fs.ErrPermission and fs.ErrNotExist and reported possible permission denied errors. This was lost when refactoring to the DeploymentLock interface. Co-authored-by: Isaac
Add print_requests.py cleanup at the end of each script to clear remaining recorded requests, preventing out.requests.txt from being generated as a golden file. DMS requests are already printed inline in output.txt. Co-authored-by: Isaac
Open out.requests.txt with explicit utf-8 encoding to handle non-ASCII characters in request bodies. Co-authored-by: Isaac
Regenerated with Python 3.11 after fixing the UnicodeDecodeError. The output.txt files now contain the inline DMS request assertions without the Python traceback errors. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Keep resources.json maintained alongside the DMS deployment so users have a backward path if they hit issues with the DMS-backed flow. Move DMS-specific bookkeeping (the deployment_id that ties the bundle to a server-side deployment record) into a sibling managed_service.json so the two concerns stay cleanly separated.
A single async sender goroutine drains a buffered channel of operation events; CRUD workers push onto the channel and continue. When the buffer fills (capacity matches the worker pool), workers block on the send and naturally back off — this is the only intended source of backpressure on the worker pool. Reporting is best-effort: a DMS API failure is logged and the sender keeps draining. The deploy is no longer aborted when the audit-log write fails. On a hard process crash, at most ~10 buffered events can be lost (channel capacity). Release() drains the reporter before completing the version so the audit trail is as complete as possible on a clean shutdown.
…5406) ## Why DMS-backed bundle deployments (run with `DATABRICKS_BUNDLE_MANAGED_STATE=true DATABRICKS_BUNDLE_ENGINE=direct`) never set `display_name` when creating the deployment record, so the field is stored as `null`. ## What Populate `DisplayName` from `bundle.Config.Bundle.Name` (i.e. the `bundle.name` from `databricks.yml`) when issuing `CreateDeployment`. This matches the human-readable label users already see in `databricks bundle validate`. ## Tests Existing `acceptance/bundle/dms/*` tests record the `CreateDeployment` request body via `print_requests.py`; their `output.txt` files regenerate to assert the new `display_name` field. This pull request and its description were written by Isaac.
The deployment metadata service now accepts git provenance on a version (origin_url, branch, commit) per databricks-eng/universe#2009991. Record it on CreateVersion using the same values the CLI writes to metadata.json.
# Conflicts: # bundle/deploy/lock/acquire.go # bundle/statemgmt/state_push.go # cmd/bundle/utils/process.go # libs/testserver/fake_workspace.go # libs/testserver/server.go
… for determinism Main's direct engine applies resources concurrently, so the order of recorded CreateOperation requests varied between runs. Add --sort to print_requests.py in the multi-resource DMS tests to make the recorded output deterministic.
Merging main changed several APIs the DMS code predates: - WorkspaceClient now takes a ctx (workspace_filesystem.go). - StateDB keeps a separate resource-key->ID index (stateIDs) that is authoritative during writes; Data.State is only reconstructed when the WAL is merged. LoadStateFromDMS wrote Data.State directly, leaving the index empty, so deletes failed with "missing in state". It now builds the database and calls OpenWithData, which populates the index. - The inline operation reporter read the freshly-created resource ID and state from Data.State (stale during a deploy). It now reads the ID from GetResourceID and the state from the value just applied, so operations carry the real resource_id and state and the server round-trips them.
The SDK's JobDeployment/PipelineDeployment now carry deployment_id and version_id (used to look up deployment metadata in the DMS). Stamp them onto each job and pipeline so every resource records the deployment and the version that produced it. The IDs are only known after the deployment lock is acquired, so a new deploy-phase mutator (AnnotateDeploymentVersion) sets them, running after the lock and before the plan. The version is plumbed onto the bundle alongside the deployment ID. version_id changes on every deploy, so an ignore_local_changes rule keeps it from triggering an update on its own; a real update still sends the current version_id via the full-config Reset/EditPipeline. (Also adjusts isAborted to errors.AsType for the Go 1.26 linter pulled in by the merge.)
…ion_id Operations now carry the resource_id and full state (including the deployment block with deployment_id/version_id), and the out.test.toml dump format changed on main. sequential-deploys now shows the version_id rule working: deploy 2 bumps the version but the unchanged test_job records no operation.
## Changes Set `display_name` on the DMS deployment version, using the bundle name — the same value already recorded on the deployment. The `Version` proto has a `display_name` field, but the `CreateVersion` request never populated it, so every version came back with a null `display_name` even though the deployment had one. This stamps it for parity. ## Why `display_name` is set on the deployment (from the bundle name) but was missing on each version, leaving version records without a human-readable label. Filling it in keeps deployment and version metadata consistent. ## Tests Updated the `bundle/dms` acceptance outputs and confirmed they pass. This pull request and its description were written by Isaac, an AI coding agent.
…ta-service' into dms-gitinfo
## Changes
Record the bundle target deployment mode on each DMS version. Adds a
`deployment_mode` field (and the `DEPLOYMENT_MODE_DEVELOPMENT` /
`DEPLOYMENT_MODE_PRODUCTION` enum) to `tmpdms.Version`, and sets it in
the `CreateVersion` request from `bundle.mode`.
Not set on the deployment: `Deployment.deployment_mode` is derived
server-side from the most recent version's mode (output-only), so the
CLI only sets it on the version. A target with no `mode` maps to an
empty value, which is omitted (the server treats it as unspecified) — we
don't fabricate a default.
## Why
The SDK's `bundle.Version` already carries `deployment_mode` ("captured
at the time of this version"), but the CLI never populated it, so every
version recorded a null mode. This stamps it so each version records
whether it was a development or production deployment.
## Tests
Added a unit test for the mode mapping (development / production /
unset). The `bundle/dms` acceptance outputs are unchanged because those
targets don't set a mode. Verified live against a workspace: a `mode:
development` target now records `deployment_mode:
DEPLOYMENT_MODE_DEVELOPMENT` on the created version.
This pull request and its description were written by Isaac, an AI
coding agent.
Stamp the deployment's workspace location onto the DMS Version, mirroring the values the CLI already writes to metadata.json (see bundle/deploy/metadata/compute.go): workspace root_path/file_path, the sync root as file_path for source-linked deployments, and the workspace git folder path. This makes the deployment metadata service an equivalent source of truth for a bundle's workspace location, alongside the existing git_info/display_name/target_name fields. Co-authored-by: Isaac
Rename the experimental DMS opt-in env var from DATABRICKS_BUNDLE_MANAGED_STATE to DATABRICKS_BUNDLE_RECORD_DEPLOYMENT_HISTORY so it matches the feature's name (recording deployment history). The env helper, all three call sites, and the DMS acceptance tests are updated; behavior is unchanged. Co-authored-by: Isaac
Collaborator
Integration test reportCommit: f186af0
22 interesting tests: 13 SKIP, 7 RECOVERED, 2 flaky
Top 27 slowest tests (at least 2 minutes):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The experimental "record deployment history" feature (DMS) was gated on
DATABRICKS_BUNDLE_MANAGED_STATE, a name that no longer matches what the feature does. This renames the opt-in env var toDATABRICKS_BUNDLE_RECORD_DEPLOYMENT_HISTORYso the gate reads as the feature it enables.What
env.ManagedState→env.RecordDeploymentHistoryand the variable it reads (bundle/env/record_deployment_history.go, renamed fromdeployment_metadata.go).bundle/deploy/lock/lock.go,bundle/statemgmt/state_pull.go,bundle/statemgmt/state_push.go.EnvMatrixto the new name; regeneratedout.test.toml.Behavior is unchanged — only the env var name.
output.txtfiles are untouched because the code path is identical.Note
Based on
main, but the branch is stacked onshreyas-goenka/deployment-metadata-service, so the diff currently includes the full DMS stack. It shrinks to just the env-var rename once the DMS branch merges tomain. The rename is the last commit.This pull request and its description were written by Isaac.