Skip to content

bundle: record deployment history in DMS after approval#5386

Open
shreyas-goenka wants to merge 1 commit into
shreyas-goenka/dms-lineage-in-walfrom
shreyas-goenka/bundle-dms-implementation
Open

bundle: record deployment history in DMS after approval#5386
shreyas-goenka wants to merge 1 commit into
shreyas-goenka/dms-lineage-in-walfrom
shreyas-goenka/bundle-dms-implementation

Conversation

@shreyas-goenka

@shreyas-goenka shreyas-goenka commented May 31, 2026

Copy link
Copy Markdown
Contributor

Records each approved deploy/destroy as a version with the Deployment Metadata Service (DMS), gated by experimental.record_deployment_history and the direct engine.

Stacked on #5667 (the lineage/state-layer foundation). This PR is the consumer.

Key behavior: the version is created only after the plan is approved — a cancelled or declined deploy/destroy records nothing, so there are no empty/abandoned versions for operations that never ran.

  • libs/dms: Recorder with CreateVersion / CompleteVersion. The deployment ID is the state lineage (GetOrInitLineage), so a bundle deployment maps one-to-one to a DMS deployment record. GetDeployment first, CreateDeployment only when missing, then the next version; heartbeat keeps the lease alive; CompleteVersion records success/failure and, for destroy, deletes the deployment on success. Independent of bundle/deploy/lock.
  • phases: newDeploymentRecorder builds it from the bundle (nil unless the flag is set and the engine is direct); deploy/destroy create the version inside the approved branch (after UpgradeToWrite, so the lineage is already in the WAL) and complete it in the deferred lock release.
  • libs/testserver: in-memory DMS handlers under /api/2.0/bundle/....
  • acceptance/bundle/dms: deploy/redeploy/destroy record versions while holding the file lock; redeploy after deleting .databricks recovers the lineage from remote state; enabling the flag after a plain deploy creates a new deployment; a declined destroy records no version and does not delete the deployment.

Follow-ups: inherit the state serial as the version id + previous_version_id staleness check (needs the SDK regen from universe #2061768).

@shreyas-goenka shreyas-goenka changed the title bundle/deploy/lock: add DMS-backed DeploymentManager implementation bundle/deploy/lock: add DMS-backed DeploymentManager implementation using SDK bundle client May 31, 2026
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: adfb856

Run: 27834001471

Env 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 13 265 1014 6:48
💚​ aws windows 7 13 267 1012 7:53
💚​ aws-ucws linux 7 13 361 928 7:08
💚​ aws-ucws windows 7 13 363 926 9:37
💚​ azure linux 1 15 268 1012 5:45
💚​ azure windows 1 15 270 1010 8:12
💚​ azure-ucws linux 1 15 366 924 8:09
💚​ azure-ucws windows 1 15 368 922 9:02
💚​ gcp linux 1 15 264 1015 7:04
💚​ gcp windows 1 15 266 1013 9:09
20 interesting tests: 13 SKIP, 7 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
💚​ TestAccept 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 26 slowest tests (at least 2 minutes):
duration env testname
4:36 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:32 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:22 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:56 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:38 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:33 azure windows TestAccept
3:29 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:29 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:23 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22 azure-ucws windows TestAccept
3:19 aws windows TestAccept
3:15 gcp windows TestAccept
3:14 aws-ucws windows TestAccept
3:13 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:07 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:54 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:50 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:49 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:46 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:44 aws linux TestSecretsPutSecretStringValue
2:38 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:25 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:25 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:22 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 98dc0c7 to 4a4382f Compare June 1, 2026 12:33
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 4a4382f to de9adfd Compare June 1, 2026 12:35
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from de9adfd to bb16fcd Compare June 1, 2026 15:11
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from bb16fcd to bdc7ba2 Compare June 1, 2026 15:16
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from bdc7ba2 to 1e8ba45 Compare June 1, 2026 15:23
@shreyas-goenka shreyas-goenka changed the title bundle/deploy/lock: add DMS-backed DeploymentManager implementation using SDK bundle client bundle/deploy/lock: record deployment history in DMS behind experimental.record_deployment_history Jun 1, 2026
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-lock-abstraction branch from ff910cd to ee860e8 Compare June 1, 2026 15:31
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 1e8ba45 to 76bf017 Compare June 1, 2026 15:31
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 063dad9 to 607bdd0 Compare June 1, 2026 16:54
@shreyas-goenka shreyas-goenka requested a review from denik June 1, 2026 17:02

// The server validates that versionID equals last_version_id + 1 and returns
// ABORTED otherwise (e.g. a concurrent deploy already created this version).
version, versionErr := svc.CreateVersion(ctx, sdkbundle.CreateVersionRequest{

@shreyas-goenka shreyas-goenka Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not work well when the plan is serialized and potentially outdated (because we do not use serial here)

Will be fixed in a followup.

@shreyas-goenka

Copy link
Copy Markdown
Contributor Author

We can remove the traditional file based lock in a followup. Not necessary for now / preview.

Comment thread bundle/phases/deploy.go
Comment thread bundle/deploy/lock/deployment_metadata_service.go Outdated
return fmt.Errorf("failed to parse version ID %q: %w", versionID, err)
}
r.versionNum = versionNum
r.stopHeartbeat = startHeartbeat(ctx, r.svc, r.deploymentID, versionID)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call CreateVersion twice: in deploy and destroy and seem to start heatbeat twice, shall we have only 1 instance of heartbeat?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify? Those are independent code paths and both need a heartbeat right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, my bad, indeed a separate processes

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 607bdd0 to 98f4444 Compare June 15, 2026 13:40
Comment thread bundle/phases/deploy.go Outdated
Comment thread bundle/phases/deploy.go Outdated
bundle.ApplyContext(ctx, b, lock.Release(lock.GoalDeploy))
}()
if err := recorder.CreateVersion(ctx); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we remove file lock later on, how do we ensure that there's no race condition creating multiple versions? We don't seem to do any synchronisation / locking here now, is it part of follow up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CreateVersion implicitly has locking semantics. Only one client can have a "live" version that is in progress at a time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server-side version counter is the synchronization: CreateVersion only succeeds when the requested version is last_version_id + 1, otherwise it returns ABORTED (409). So even without the file lock, two concurrent deploys racing to create the next version would have one win and the other get ABORTED. Surfacing that as a clean user-facing error (retry/serial handling) is the follow-up I noted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, make sense. Could you add an acceptance test for it though to make sure this behaviour is recorded?

@andrewnester andrewnester self-requested a review June 16, 2026 12:36
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 98f4444 to 87c99e8 Compare June 17, 2026 00:27
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 87c99e8 to 9a7feb1 Compare June 17, 2026 00:28
if db.Data.Lineage == "" {
db.Data.Lineage = uuid.New().String()
}
return db.Data.Lineage

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is initialized in Open() below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For direct yes. In case of DMS this is initialized at pla n time here instead. I did not want to touch direct deployment code paths.

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 9a7feb1 to 35f5a64 Compare June 19, 2026 07:43
Records each approved deploy/destroy as a version with the Deployment Metadata
Service (DMS), gated by experimental.record_deployment_history and the direct
engine. The version is created only after the plan is approved — a cancelled or
declined deploy/destroy records nothing, so there are no empty/abandoned
versions for operations that never ran.

- libs/dms: Recorder with CreateVersion / CompleteVersion. The deployment ID is
  the state lineage (from GetOrInitLineage), so a bundle deployment maps
  one-to-one to a DMS deployment record. GetDeployment first, CreateDeployment
  only when missing, then create the next version; heartbeat keeps the version's
  lease alive; CompleteVersion records success/failure and, for destroy, deletes
  the deployment record on success. Independent of bundle/lock.
- phases: newDeploymentRecorder builds the recorder from the bundle (nil unless
  the flag is set and the engine is direct); deploy/destroy create the version
  inside the approved branch (after UpgradeToWrite, so the lineage is already in
  the WAL) and complete it in the deferred lock release.
- libs/testserver: in-memory DMS handlers under /api/2.0/bundle/...
- acceptance/bundle/dms: deploy/redeploy/destroy record versions and hold the
  file lock; redeploy after deleting .databricks recovers the lineage from
  remote state; enabling the flag after a plain deploy creates a new deployment;
  a declined destroy records no version and does not delete the deployment.

Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants