bundle: record deployment history in DMS after approval#5386
bundle: record deployment history in DMS after approval#5386shreyas-goenka wants to merge 1 commit into
Conversation
Integration test reportCommit: adfb856
20 interesting tests: 13 SKIP, 7 RECOVERED
Top 26 slowest tests (at least 2 minutes):
|
98dc0c7 to
4a4382f
Compare
4a4382f to
de9adfd
Compare
de9adfd to
bb16fcd
Compare
bb16fcd to
bdc7ba2
Compare
bdc7ba2 to
1e8ba45
Compare
ff910cd to
ee860e8
Compare
1e8ba45 to
76bf017
Compare
063dad9 to
607bdd0
Compare
|
|
||
| // The server validates that versionID equals last_version_id + 1 and returns | ||
| // ABORTED otherwise (e.g. a concurrent deploy already created this version). | ||
| version, versionErr := svc.CreateVersion(ctx, sdkbundle.CreateVersionRequest{ |
There was a problem hiding this comment.
this will not work well when the plan is serialized and potentially outdated (because we do not use serial here)
Will be fixed in a followup.
|
We can remove the traditional file based lock in a followup. Not necessary for now / preview. |
| return fmt.Errorf("failed to parse version ID %q: %w", versionID, err) | ||
| } | ||
| r.versionNum = versionNum | ||
| r.stopHeartbeat = startHeartbeat(ctx, r.svc, r.deploymentID, versionID) |
There was a problem hiding this comment.
We call CreateVersion twice: in deploy and destroy and seem to start heatbeat twice, shall we have only 1 instance of heartbeat?
There was a problem hiding this comment.
Can you clarify? Those are independent code paths and both need a heartbeat right?
There was a problem hiding this comment.
Ah right, my bad, indeed a separate processes
607bdd0 to
98f4444
Compare
| bundle.ApplyContext(ctx, b, lock.Release(lock.GoalDeploy)) | ||
| }() | ||
| if err := recorder.CreateVersion(ctx); err != nil { |
There was a problem hiding this comment.
When we remove file lock later on, how do we ensure that there's no race condition creating multiple versions? We don't seem to do any synchronisation / locking here now, is it part of follow up?
There was a problem hiding this comment.
The CreateVersion implicitly has locking semantics. Only one client can have a "live" version that is in progress at a time.
There was a problem hiding this comment.
The server-side version counter is the synchronization: CreateVersion only succeeds when the requested version is last_version_id + 1, otherwise it returns ABORTED (409). So even without the file lock, two concurrent deploys racing to create the next version would have one win and the other get ABORTED. Surfacing that as a clean user-facing error (retry/serial handling) is the follow-up I noted.
There was a problem hiding this comment.
Got it, make sense. Could you add an acceptance test for it though to make sure this behaviour is recorded?
98f4444 to
87c99e8
Compare
87c99e8 to
9a7feb1
Compare
| if db.Data.Lineage == "" { | ||
| db.Data.Lineage = uuid.New().String() | ||
| } | ||
| return db.Data.Lineage |
There was a problem hiding this comment.
This is initialized in Open() below.
There was a problem hiding this comment.
For direct yes. In case of DMS this is initialized at pla n time here instead. I did not want to touch direct deployment code paths.
9a7feb1 to
35f5a64
Compare
Records each approved deploy/destroy as a version with the Deployment Metadata Service (DMS), gated by experimental.record_deployment_history and the direct engine. The version is created only after the plan is approved — a cancelled or declined deploy/destroy records nothing, so there are no empty/abandoned versions for operations that never ran. - libs/dms: Recorder with CreateVersion / CompleteVersion. The deployment ID is the state lineage (from GetOrInitLineage), so a bundle deployment maps one-to-one to a DMS deployment record. GetDeployment first, CreateDeployment only when missing, then create the next version; heartbeat keeps the version's lease alive; CompleteVersion records success/failure and, for destroy, deletes the deployment record on success. Independent of bundle/lock. - phases: newDeploymentRecorder builds the recorder from the bundle (nil unless the flag is set and the engine is direct); deploy/destroy create the version inside the approved branch (after UpgradeToWrite, so the lineage is already in the WAL) and complete it in the deferred lock release. - libs/testserver: in-memory DMS handlers under /api/2.0/bundle/... - acceptance/bundle/dms: deploy/redeploy/destroy record versions and hold the file lock; redeploy after deleting .databricks recovers the lineage from remote state; enabling the flag after a plain deploy creates a new deployment; a declined destroy records no version and does not delete the deployment. Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
Records each approved deploy/destroy as a version with the Deployment Metadata Service (DMS), gated by
experimental.record_deployment_historyand the direct engine.Stacked on #5667 (the lineage/state-layer foundation). This PR is the consumer.
Key behavior: the version is created only after the plan is approved — a cancelled or declined deploy/destroy records nothing, so there are no empty/abandoned versions for operations that never ran.
libs/dms:RecorderwithCreateVersion/CompleteVersion. The deployment ID is the state lineage (GetOrInitLineage), so a bundle deployment maps one-to-one to a DMS deployment record.GetDeploymentfirst,CreateDeploymentonly when missing, then the next version; heartbeat keeps the lease alive;CompleteVersionrecords success/failure and, for destroy, deletes the deployment on success. Independent ofbundle/deploy/lock.phases:newDeploymentRecorderbuilds it from the bundle (nil unless the flag is set and the engine is direct); deploy/destroy create the version inside the approved branch (afterUpgradeToWrite, so the lineage is already in the WAL) and complete it in the deferred lock release.libs/testserver: in-memory DMS handlers under/api/2.0/bundle/....acceptance/bundle/dms: deploy/redeploy/destroy record versions while holding the file lock; redeploy after deleting.databricksrecovers the lineage from remote state; enabling the flag after a plain deploy creates a new deployment; a declined destroy records no version and does not delete the deployment.Follow-ups: inherit the state serial as the version id +
previous_version_idstaleness check (needs the SDK regen from universe #2061768).