Immutable folder support in DABs#5254
Conversation
Integration test reportCommit: 9a6c898
23 interesting tests: 13 SKIP, 7 KNOWN, 3 BUG
Top 20 slowest tests (at least 2 minutes):
|
Approval status: pending
|
shreyas-goenka
left a comment
There was a problem hiding this comment.
Minor comments - other than the bit where we use metadata.json
| uploadLibraries(ctx, b, libs) | ||
| if b.Config.Bundle.Immutable { | ||
| // Upload all source files and built artifacts as a single immutable snapshot. | ||
| // The API assigns a content-addressed path, so workspace.snapshot_path (and |
There was a problem hiding this comment.
should we add a check explicitly disallowing workspace.snapshot_path references? We don't want users to accidentally rely on it and ensure that this is internal only.
| @@ -15,6 +15,9 @@ type Bundle struct { | |||
|
|
|||
| type Workspace struct { | |||
| FilePath string `json:"file_path"` | |||
| // SnapshotPath is the workspace path of the immutable snapshot uploaded | |||
| // during deployment. Only populated for bundles with bundle.immutable = true. | |||
| SnapshotPath string `json:"snapshot_path,omitempty"` | |||
There was a problem hiding this comment.
Will the UI use the immutable snapshot path? In that case we'll need to add it to DMS as well.
|
|
||
| cmdio.LogString(ctx, "Uploading immutable bundle snapshot...") | ||
|
|
||
| zipContent, err := BundleZip(ctx, b) |
There was a problem hiding this comment.
Should we have a test that for a set of files the bundle zip remains identical across operating system (simple unit test should do it). Windows vs linux often have different new line characters.
There was a problem hiding this comment.
Yes, they likely will be different but what is the issue in this case?
There was a problem hiding this comment.
It's a performace optimization to avoid unnecessary updates if the snapshot did not change. The backend stores the sha on the ZIP. Should be a rare edge case so we can skip it.
| } | ||
|
|
||
| func (m *load) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| f, err := filer.NewWorkspaceFilesClient(b.WorkspaceClient(ctx), b.Config.Workspace.StatePath) |
There was a problem hiding this comment.
metadata.json was only meant to be consumed by non DABs resources like jobs or the UI. We have deployment.json and resources.json. Maybe we should store the snapshot path there?
This is a new dep - before the CLI did not read metadata.json - and we are trying to get rid of this with DMS.
|
|
||
| // Perform resolution only if the path starts with one of the specified prefixes. | ||
| if slices.ContainsFunc(prefixes, path.HasPrefix) { | ||
| if slices.Contains(m.excludePaths, path.String()) { |
There was a problem hiding this comment.
very nitpicky: Make this more robust? This matches substrings - so "abc" would match "abc" and "abcd". Normally in the codebase - paths refer to exact paths, not patterns or substrings.
| Deployment complete! | ||
|
|
||
| >>> [CLI] jobs get [NUMID] | ||
| "/Workspace/Users/[UUID]/.snapshots/test-bundle-immutable-[UNIQUE_NAME]/11f80ca6d8923bf75b57e475d4ca9ba4bb1d6d48c58aace8d3f2a1289b51c6e0/src/files/src/main.py" |
There was a problem hiding this comment.
I guess this test covers that the SHA remains identical for windows and linux?
There was a problem hiding this comment.
Why would it matter though? Practically they might be different, especially wheels built
pietern
left a comment
There was a problem hiding this comment.
I see the existing tests pass against cloud, but recommend including a testserver implementation already. Makes it easier to iterate.
| if b.Config.Bundle.Deployment.ImmutableFolder { | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Is it possible we always delay this until the deploy phase? Keeps things simpler.
| } | ||
| return 0 | ||
| }) | ||
| return files, nil |
There was a problem hiding this comment.
The logic that collects the list of files looks very similar to what we do in libs/sync. Any chance we can let that pkg take care of building the list and we refer to it here? This set of files is also configured as b.Files for telemetry. If we chase the path that sets it we might be able to reuse it?
| } | ||
|
|
||
| return &SnapshotInfo{Path: resp.Snapshot.Path}, nil | ||
| } |
There was a problem hiding this comment.
This doesn't seem related to the other filers or the filer interface.
It seems logical to colocate this client with the code that takes a couple of []file.FileSet and performs all the zipping and uploading.
| } | ||
|
|
||
| // The real API uses the workspace user UUID (not email) in the snapshot path, | ||
| // matching service-principal identities used in cloud acceptance tests. |
There was a problem hiding this comment.
optional: Should we also use UUIDs here for better fidelity? The benefits are minor if we have cloud coverage. The difference being users have CAN_MANAGE always on their home directory but that's likely not true for /Users/userId?
| // Apply implements bundle.Mutator. | ||
| func (*workspaceRootPermissions) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| // If the bundle is immutable, we don't need to apply any permissions to the workspace root. | ||
| if b.Config.Bundle.Deployment.ImmutableFolder { |
There was a problem hiding this comment.
is this safe to do? Given that state management and resource CRUD (workspace.state_path and workspace.resource_path ) both need to have ACLs applied for shared deployments to work.
| Whether to fail on active runs. If this is set to true a deployment that is running can be interrupted. | ||
| "immutable_folder": | ||
| "description": |- | ||
| Whether to upload bundle files and artifacts as a single immutable snapshot. When true, all files are packaged into a zip and uploaded via the snapshot API, and workspace.file_path and workspace.artifact_path are set to the returned content-addressed path. The validate and plan commands make no mutative API calls when this is enabled. |
There was a problem hiding this comment.
Snapshots API is internal. We should not refer to this in the docs.
| func (m *translateResourcePaths) Name() string { return "snapshot.TranslateResourcePaths" } | ||
|
|
||
| func (m *translateResourcePaths) Apply(_ context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| localPrefix := b.SyncRootPath + "/" |
There was a problem hiding this comment.
Maybe strings.TrimSuffix("/") as well to account for trailing /? Or is that not possible?
| func (s *loadState) Name() string { return "snapshot.LoadState" } | ||
|
|
||
| func (s *saveState) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics { | ||
| if b.Config.Workspace.SnapshotPath == "" { |
There was a problem hiding this comment.
Do we not need to store this remotely? The code reads like it only does local storage but we need it in remote as well.
There was a problem hiding this comment.
This sounds like something we can add to resources.json
|
|
||
| // SaveState writes the snapshot path to the local deployment state directory | ||
| // so it can be recovered during destroy without reading metadata.json. | ||
| func SaveState() bundle.Mutator { |
There was a problem hiding this comment.
Consider integrating this with deployment WAL? If we can record the snapshot upload event we can avoid reuploading it on a subsequent deployment since it already exists.
There was a problem hiding this comment.
We can't really, can we? The deployment WAL calculated before the execution of deployment as part of plan but we build and upload later in the phase
There was a problem hiding this comment.
I assume during deployments we write to WAL so surely we should be able to do that during / after file upload? Otherwise do how do we capture if the plan was partially applied.
There was a problem hiding this comment.
I could be misunderstanding the WAL - I'm not familiar with it (will look into it) - I just assumed it records actions as we do them.
| >>> [CLI] jobs get [NUMID] | ||
| "/Workspace/Users/[UUID]/.snapshots/test-bundle-immutable-no-artifacts-[UNIQUE_NAME]/[SNAPSHOT_HASH]/src/files/src/notebook" | ||
|
|
||
| >>> [CLI] bundle destroy --auto-approve |
There was a problem hiding this comment.
can we assert that destroy deletes the snapshot? Even when .databricks is removed?
| // Updates (dynamic): resources.* (strings) (resolves variable references to their actual values) | ||
| // Resolves variable references in 'resources' using bundle, workspace, and variables prefixes | ||
| mutator.ResolveVariableReferencesOnlyResources(), | ||
| resourceResolver, |
| @@ -0,0 +1,16 @@ | |||
| Local = false | |||
There was a problem hiding this comment.
we can run locally as well? As long as we fix the snapshot API impl in test server?
Changes
Added support for deploying bundles to immutable folders in the workspace
Enabled by using
Why
Tests
Added an acceptance tests