Clean up direct-engine state entries for remotely deleted resources removed from config#5496
Clean up direct-engine state entries for remotely deleted resources removed from config#5496simonfaltum wants to merge 5 commits into
Conversation
… from config When a resource was deleted remotely and also removed from the bundle config, CalculatePlan dropped the Delete entry from the plan, so the stale state entry was never removed and reappeared on every deploy. Keep the Delete entry instead; Delete() already tolerates resources that are gone and removes the entry from state. Also add the missing location guard in metadata.Compute for pipelines and dashboards (jobs got it in #3710), which made the same scenario hard-fail the deploy. Co-authored-by: Isaac
Approval status: pending
|
Integration test reportCommit: 232ca17
27 interesting tests: 15 SKIP, 7 KNOWN, 5 flaky
Top 50 slowest tests (at least 2 minutes):
|
Co-authored-by: Isaac
Co-authored-by: Isaac
|
The direct engine now keeps the delete entry for a remotely deleted resource so its stale state entry gets cleaned up during destroy, which makes the destroy output of acceptance/bundle/resources/dashboards/delete-trashed-out-of-band legitimately diverge per engine. Split that section into out.destroy.terraform.txt and out.destroy.direct.txt per the repo convention, same as the existing jobs/remote_delete/destroy test. |
| // root | ||
| l := b.Config.GetLocation("resources.pipelines." + name) | ||
| if l.File == "" { | ||
| // b.Config.Resources.Pipelines may include a pipeline that only exists in state but not in config |
There was a problem hiding this comment.
I don't understand this comment, we iterate over a config there, so pipeline in question must exist in config.
There was a problem hiding this comment.
Good catch, the wording was confusing. At this point b.Config.Resources is not only user config: deployCore runs statemgmt.Load right before metadata.Compute, and StateToBundle injects resources that exist only in state (just id and modified_status, no file location). The terraform variant of the removed_from_config test hits this, since terraform keeps the gone resources in its state. Same guard the jobs loop got in #3710. Reworded the comments in a01eb5d to explain this.
| // This is not an error during deletion, so don't return false here | ||
| } | ||
| // If the resource is already gone remotely, keep the Delete entry: Delete() | ||
| // tolerates missing resources and applying the entry removes it from the state. |
There was a problem hiding this comment.
I'd like to avoid unnecessary call though, for 2 reasons: 1) performance and 2) to reduce chance of failing there because of unexpected error.
Perhaps we should pass this info in plan entry to deploy. E.g. set gone: true on the entry (we cannot use remote == nil since it will also be nil in case of read error and in that case we do want to call delete).
There was a problem hiding this comment.
Related:
A side effect visible in test output: bundle destroy with the direct engine now lists a remotely deleted resource under "The following resources will be deleted", since its state entry is now explicitly cleaned instead of silently dropped.
This is not great because the message is not correct anymore (we know delete is going to be nop-op). If we track it like I suggest above, we can filter out such plan entries from the warning.
There was a problem hiding this comment.
Done in a01eb5d. Delete entries now get gone: true when the planning read returns 404, and apply then only removes the entry from state without calling delete. Read errors still leave gone unset, so delete is attempted as before. The removed_from_config test now also asserts that no delete API calls are made for the gone resources.
There was a problem hiding this comment.
Fixed in a01eb5d. Deploy and destroy approvals now filter out gone entries, so they are not listed as deletions, do not trigger approval prompts, and do not trip prevent_destroy. With that the destroy output no longer diverges per engine, so I reverted the out.destroy split for delete-trashed-out-of-band.
Per review feedback: planning now sets gone on a Delete entry when the remote read returns 404. Applying such an entry removes it from state without calling the delete API, and deploy/destroy approval prompts no longer list it as a deletion or ask for approval, since nothing is actually deleted. With that the destroy output for remotely deleted resources matches the terraform engine again, so the per-engine out.destroy split is reverted for delete-trashed-out-of-band and the removed_from_config test now also asserts that no delete API calls are made for gone resources. Also reword the state-only resource comments in metadata.Compute to explain the statemgmt.Load injection that makes the guards necessary. Co-authored-by: Isaac
|
Addressed the review feedback in a01eb5d:
Validation: |
Why
Found during a full-repo review of the CLI. With the direct engine, a resource that was deleted remotely and then removed from the bundle config could never leave the deployment state. The plan dropped its Delete entry, so the state cleanup never ran and the stale entry came back on every deploy. For pipelines and dashboards the same scenario was worse: the deploy hard-failed in metadata computation with "failed to compute relative path", and since the state never got cleaned, every following deploy failed the same way.
Changes
Before, deploying after a resource was deleted remotely and removed from config either left a stale state entry forever (jobs) or failed the deploy outright (pipelines, dashboards); now the deploy plans a delete for the stale entry, cleans it from state, and completes.
bundle/direct/bundle_plan.go: when planning a Delete and the remote read reports the resource is gone, keep the Delete entry instead of removing it from the plan.Delete()already tolerates missing resources and removes the entry from state when applied.bundle/deployplan/plan.go: removeRemoveEntry, whose only caller was the code path above.bundle/deploy/metadata/compute.go: add the missingl.File == ""guard to the pipelines and dashboards loops, matching the guard the jobs loop received in direct: handle remotely deleted resources #3710. State-only resources have no config location, andfilepath.Relerrors on an empty path.A side effect visible in test output:
bundle destroywith the direct engine now lists a remotely deleted resource under "The following resources will be deleted", since its state entry is now explicitly cleaned instead of silently dropped.Test plan
acceptance/bundle/resources/jobs/remote_delete/removed_from_configcovering deploy, remote delete of a job and a pipeline, config removal, deploy twice, and summary, for both engines. With the direct engine the first deploy cleans the stale entries and the second deploy and summary are clean. The terraform variant also exercises the new metadata guard, since terraform keeps the gone resources in its state.TestComputeMetadataMutatorStateOnlyResourcesverifying state-only jobs, pipelines, and dashboards are skipped without error.bundle/direct,bundle/deployplan, andbundle/deploy/metadata.jobs/remote_delete,volumes/remote-delete,volumes/remote-change-name,vector_search_indexes/drift,permissions,deployment/bind,deployment/unbind,bundle/destroy,bundle/generate,bundle/invariant,clusters,quality_monitors,registered_models, synced tables). Onlyjobs/remote_delete/destroychanged output (direct destroy now lists the gone job) and was regenerated with-update../task fmt-q,./task lint-q,./task checks.This pull request and its description were written by Isaac.