Skip to content

Integrate deployment metadata service for locking and state#4856

Open
shreyas-goenka wants to merge 30 commits into
mainfrom
shreyas-goenka/deployment-metadata-service
Open

Integrate deployment metadata service for locking and state#4856
shreyas-goenka wants to merge 30 commits into
mainfrom
shreyas-goenka/deployment-metadata-service

Conversation

@shreyas-goenka

@shreyas-goenka shreyas-goenka commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Integrates the Deployment Metadata Service (DMS) as an alternative backend for deployment locking and resource state management. Gated behind DATABRICKS_BUNDLE_MANAGED_STATE=true.

When enabled:

  • Locking: Uses server-side versioned locks (with heartbeats) instead of workspace filesystem lock files
  • State: Reads/writes resource state via the DMS API (ListResources / CreateOperation) instead of local state files
  • Operations: Reports each resource operation (create, update, delete) inline to the server with resource state
  • Git provenance: Records git_info (origin_url, branch, commit) on the deployment version — the same values the CLI writes to metadata.json. Server support added in databricks-eng/universe#2009991.

Key implementation details

  • DeploymentLock interface (lock.go) with two implementations: workspaceFilesystemLock (existing behavior) and metadataServiceLock (DMS)
  • resolveDeploymentID reads deployment ID from workspace resources.json, or generates a new UUID for fresh deployments (written only after CreateDeployment succeeds)
  • LoadStateFromDMS populates the state DB from ListResources instead of reading local files
  • PushResourcesState is a no-op with DMS (state is persisted per-operation to the server)
  • --plan flag and bind/unbind are not supported with DMS
  • Heartbeat goroutine keeps the lock alive during long deployments

Test plan

  • Acceptance tests under acceptance/bundle/dms/ covering: deploy with resource creation, sequential deploys with create/delete, plan + summary, deploy errors, and lock release errors
  • Unit test for planActionToOperationAction mapping
  • E2E testing against staging workspace (32/32 passing)
  • E2E on e2-dogfood: deployed a git-backed bundle and confirmed the version's git_info round-trips through the DMS service:
"git_info": {
    "branch": "my-test-branch",
    "commit": "3bae783bc0dc303bc37a2cdfd0b2bebeeaf11e65",
    "origin_url": "https://github.com/databricks/cli-gitinfo-e2e-test.git"
}

Update: provenance + main merge

  • Merged latest main (SDK v0.141.0, Go 1.26 toolchain). Reconciled the lock-package refactor and fixed the DMS state round-trip against main's WAL-based StateDB: LoadStateFromDMS now uses OpenWithData (populates the resource-key→ID index), and the inline operation reporter reads the resource ID via GetResourceID and the state from the just-applied value, so operations carry the real resource_id and state.
  • Record git_info (origin_url, branch, commit) on the deployment version — same provenance as metadata.json (server support: databricks-eng/universe#2009991).
  • Record deployment_id/version_id on each job and pipeline's deployment block. A deploy-phase mutator (AnnotateDeploymentVersion) stamps them after the lock is acquired. version_id changes every deploy, so an ignore_local_changes rule keeps it from triggering an update on its own; real updates still send the current value via the full-config Reset/Edit.

Verified end-to-end on e2-dogfood (git_info + deployment_id/version_id round-trip) and in acceptance/bundle/dms (sequential-deploys shows an unchanged job is skipped when only its version_id bumps).

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Mar 26, 2026

Copy link
Copy Markdown
Collaborator

Commit: 78efc31

Run: 27137506033

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 928 7:51
🔄​ aws windows 4 3 15 263 926 13:34
💚​ aws-ucws linux 7 15 357 842 7:19
💚​ aws-ucws windows 7 15 359 840 12:15
💚​ azure linux 1 17 264 926 7:04
💚​ azure windows 1 17 266 924 11:55
💚​ azure-ucws linux 1 17 362 838 8:11
💚​ azure-ucws windows 1 17 364 836 12:24
💚​ gcp linux 1 17 260 929 7:56
💚​ gcp windows 1 17 262 927 10:55
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🔄​f 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🔄​f 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🔄​f 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🔄​f 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 28 slowest tests (at least 2 minutes):
duration env testname
7:16 azure windows TestAccept
6:15 azure-ucws windows TestAccept
5:55 aws-ucws windows TestAccept
5:06 gcp windows TestAccept
5:05 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:54 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:32 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:27 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:22 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:06 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:06 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:04 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:54 azure linux TestAccept
2:52 aws-ucws linux TestAccept
2:52 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:52 azure-ucws linux TestAccept
2:51 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:50 gcp linux TestAccept
2:43 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:39 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:33 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:32 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:31 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:27 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:27 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

Comment thread bundle/direct/bundle_apply.go Outdated

// Report skip actions to the metadata service. On initial registration,
// these are recorded as INITIAL_REGISTER operations.
if action == deployplan.Skip && b.OperationReporter != nil {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the initial registration up

@@ -0,0 +1,6 @@
Local = true
Cloud = false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service needs to roll out to prod before we enable this on cloud.

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/deployment-metadata-service branch 11 times, most recently from 4bbbe9c to 7b26260 Compare April 14, 2026 21:15
assert.True(t, ok)
assert.Equal(t, tmpdms.VersionTypeDestroy, vt)

_, ok = goalToVersionType(GoalBind)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support can be added as a followup.

@shreyas-goenka shreyas-goenka marked this pull request as ready for review April 15, 2026 00:08
@shreyas-goenka shreyas-goenka requested review from andrewnester and pietern and removed request for andrewnester and pietern April 15, 2026 00:09
pavloKozlov and others added 7 commits June 2, 2026 11:33
…5406)

## Why

DMS-backed bundle deployments (run with
`DATABRICKS_BUNDLE_MANAGED_STATE=true DATABRICKS_BUNDLE_ENGINE=direct`)
never set `display_name` when creating the deployment record, so the
field is stored as `null`.

## What

Populate `DisplayName` from `bundle.Config.Bundle.Name` (i.e. the
`bundle.name` from `databricks.yml`) when issuing `CreateDeployment`.
This matches the human-readable label users already see in `databricks
bundle validate`.

## Tests

Existing `acceptance/bundle/dms/*` tests record the `CreateDeployment`
request body via `print_requests.py`; their `output.txt` files
regenerate to assert the new `display_name` field.

This pull request and its description were written by Isaac.
The deployment metadata service now accepts git provenance on a version
(origin_url, branch, commit) per databricks-eng/universe#2009991. Record
it on CreateVersion using the same values the CLI writes to metadata.json.
# Conflicts:
#	bundle/deploy/lock/acquire.go
#	bundle/statemgmt/state_push.go
#	cmd/bundle/utils/process.go
#	libs/testserver/fake_workspace.go
#	libs/testserver/server.go
… for determinism

Main's direct engine applies resources concurrently, so the order of recorded
CreateOperation requests varied between runs. Add --sort to print_requests.py
in the multi-resource DMS tests to make the recorded output deterministic.
Merging main changed several APIs the DMS code predates:
- WorkspaceClient now takes a ctx (workspace_filesystem.go).
- StateDB keeps a separate resource-key->ID index (stateIDs) that is
  authoritative during writes; Data.State is only reconstructed when the WAL
  is merged. LoadStateFromDMS wrote Data.State directly, leaving the index
  empty, so deletes failed with "missing in state". It now builds the
  database and calls OpenWithData, which populates the index.
- The inline operation reporter read the freshly-created resource ID and
  state from Data.State (stale during a deploy). It now reads the ID from
  GetResourceID and the state from the value just applied, so operations
  carry the real resource_id and state and the server round-trips them.
The SDK's JobDeployment/PipelineDeployment now carry deployment_id and
version_id (used to look up deployment metadata in the DMS). Stamp them onto
each job and pipeline so every resource records the deployment and the version
that produced it.

The IDs are only known after the deployment lock is acquired, so a new
deploy-phase mutator (AnnotateDeploymentVersion) sets them, running after the
lock and before the plan. The version is plumbed onto the bundle alongside the
deployment ID.

version_id changes on every deploy, so an ignore_local_changes rule keeps it
from triggering an update on its own; a real update still sends the current
version_id via the full-config Reset/EditPipeline. (Also adjusts isAborted to
errors.AsType for the Go 1.26 linter pulled in by the merge.)
…ion_id

Operations now carry the resource_id and full state (including the deployment
block with deployment_id/version_id), and the out.test.toml dump format changed
on main. sequential-deploys now shows the version_id rule working: deploy 2
bumps the version but the unchanged test_job records no operation.
## Changes
Set `display_name` on the DMS deployment version, using the bundle name
— the same value already recorded on the deployment.

The `Version` proto has a `display_name` field, but the `CreateVersion`
request never populated it, so every version came back with a null
`display_name` even though the deployment had one. This stamps it for
parity.

## Why
`display_name` is set on the deployment (from the bundle name) but was
missing on each version, leaving version records without a
human-readable label. Filling it in keeps deployment and version
metadata consistent.

## Tests
Updated the `bundle/dms` acceptance outputs and confirmed they pass.

This pull request and its description were written by Isaac, an AI
coding agent.
## Changes
Record the bundle target deployment mode on each DMS version. Adds a
`deployment_mode` field (and the `DEPLOYMENT_MODE_DEVELOPMENT` /
`DEPLOYMENT_MODE_PRODUCTION` enum) to `tmpdms.Version`, and sets it in
the `CreateVersion` request from `bundle.mode`.

Not set on the deployment: `Deployment.deployment_mode` is derived
server-side from the most recent version's mode (output-only), so the
CLI only sets it on the version. A target with no `mode` maps to an
empty value, which is omitted (the server treats it as unspecified) — we
don't fabricate a default.

## Why
The SDK's `bundle.Version` already carries `deployment_mode` ("captured
at the time of this version"), but the CLI never populated it, so every
version recorded a null mode. This stamps it so each version records
whether it was a development or production deployment.

## Tests
Added a unit test for the mode mapping (development / production /
unset). The `bundle/dms` acceptance outputs are unchanged because those
targets don't set a mode. Verified live against a workspace: a `mode:
development` target now records `deployment_mode:
DEPLOYMENT_MODE_DEVELOPMENT` on the created version.

This pull request and its description were written by Isaac, an AI
coding agent.
Stamp the deployment's workspace location onto the DMS Version, mirroring
the values the CLI already writes to metadata.json (see
bundle/deploy/metadata/compute.go): workspace root_path/file_path, the
sync root as file_path for source-linked deployments, and the workspace
git folder path. This makes the deployment metadata service an equivalent
source of truth for a bundle's workspace location, alongside the existing
git_info/display_name/target_name fields.

Co-authored-by: Isaac
## Changes
`bundle summary` now surfaces the bundle's deployment metadata service
identifiers — `deployment_id` and the current `version_id` — when the
deployment metadata service is in use (managed state + direct engine).
They appear as a `Deployment:` section in text output and as a top-level
`deployment` object in `--output json`. Non-DMS summaries are unchanged.

`version_id` isn't known to a read-only command, so summary fetches the
latest `last_version_id` from the deployment record via `GetDeployment`.

## Why
The in-workspace DABs UI needs the bundle-level `deployment_id` +
`version_id` to link to the deployment metadata page. Until now these
were only resolved internally (`b.DeploymentID`) or stamped per-resource
for the jobs UI — no command exposed the bundle's own identifiers. This
stacks on the DMS branch (#4856).

## Tests
Extended the `bundle/dms/plan-and-summary` acceptance test to cover both
the text `Deployment:` block and the `bundle summary -o json`
`deployment` object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants