bundle: record auto-migration compatibility telemetry on deploy#5439
Conversation
60fba4e to
cc47397
Compare
e7e9e83 to
7c783aa
Compare
Integration test reportCommit: 23833b7
25 interesting tests: 15 SKIP, 7 KNOWN, 3 flaky
Top 24 slowest tests (at least 2 minutes):
|
cc47397 to
b2c7271
Compare
7c783aa to
5e587f0
Compare
b2c7271 to
c2aa303
Compare
5e587f0 to
2cd3ef3
Compare
c2aa303 to
b1d732b
Compare
2cd3ef3 to
07abec1
Compare
b1d732b to
9e2f26b
Compare
07abec1 to
26c3380
Compare
9e2f26b to
00c5b7b
Compare
26c3380 to
30172b5
Compare
00c5b7b to
addcb6a
Compare
30172b5 to
3646391
Compare
abe3dfb to
3c6a454
Compare
3c6a454 to
681e089
Compare
| has_serverless_compute true | ||
| local.cache.attempt true | ||
| local.cache.miss true | ||
| permissions_section_is_set false |
There was a problem hiding this comment.
Analysis needs to take this into account when looking at state_path_acl_exceeds_permissions.
Can we choose to not emit it when permissions are not set?
681e089 to
e6521fb
Compare
|
We will also track auto migration from terraform to direct, can we prefix these values with dms_ to disambiguate? I'd also use shorter names with the same prefix for grouping / easier search in telemetry dashboard. For example, instead of use: dms_compat_auto |
|
Renamed per @denik's suggestion: the three verdict keys are now |
| if stateFolderPerms == nil { | ||
| if strings.HasPrefix(statePath, "/Workspace/Users/") { | ||
| // The home owner has CAN_MANAGE and the bundle declares no permissions. | ||
| return metrics.DMSCompatNot |
There was a problem hiding this comment.
Why not?
A user deploying to their home directory and not specifying permissions should be compatible.
There was a problem hiding this comment.
Good point. I think you are right.
Consider the case:
- Alice is the owner of the bundle. state_path is under her home.
- Bob is a workspace admin.
- The bundle has no permissions block defined.
- Bob does the bundle deploy (not alice) where the auto migraiton happens.
If we do the simple thing, Bob will have CAN_MANAGE on the deployment but not Alice. She will be locked out. That's why I modelled this as not compatible. But
If we add some logic to respect workspace.state_path and automatically grant those users CAN_MANAGE then automatic migration would work fine. I did not think about this approach. That would also work for /Shared or any other permission model as long as there is NO top level permisisons block defined.
So the case work is:
- No
permissionsblock -> Always compatible. We implicitly apply the state_path ACLs on the deployment. CAN_EDIT on state_path gets CAN_EDIT on the deploymnt, and CAN_MANAGE on state_path gets CAN_MANAGE on the deployment. - If there's a
permissionsblock: Then that has to be exactly matching. Otherwise some users would get locked out post migration since we'll appy the permissions block permissions to DMS.
There was a problem hiding this comment.
Note: an empty permissions: block is treated same as no permissions block. This matches with how direct works today.
|
How does this take groups into account? |
| // No permissions section: the migration mirrors the state folder's ACLs onto the | ||
| // deployment (CAN_EDIT -> CAN_EDIT, CAN_MANAGE -> CAN_MANAGE), preserving | ||
| // everyone's access wherever the state lives. | ||
| if len(b.Config.Permissions) == 0 { |
There was a problem hiding this comment.
Direct deployment does not touch permissions if the block is omitted or an empty permissoins: block is specified. We follow the same semantics here.
| Deployment complete! | ||
|
|
||
| >>> print_telemetry_bool_values | ||
| dms_compat_not true |
There was a problem hiding this comment.
Still keeping these as two fields and not a boolean because this keeps the option open to add a maybe enum later if we decide to change the migration policy.
There was a problem hiding this comment.
The comment says not a boolean but this field is a boolean.
There was a problem hiding this comment.
Can you clarify which comment. We are modelling this as an enum. The only reason there are in the bool map is to avoid a service rollout and get this data early since its important to get.
Groups are statitcally checked. If you define permissions block, the serialized permissions need to be a strict superset. Otherwise there can always be users who are locked out from deploying the bundle. |
| Deployment complete! | ||
|
|
||
| >>> print_telemetry_bool_values | ||
| dms_compat_not true |
There was a problem hiding this comment.
The comment says not a boolean but this field is a boolean.
| }, | ||
| { | ||
| name: "manager not declared", | ||
| folder: []resources.Permission{manage(user)}, |
There was a problem hiding this comment.
But isn't the current user implied if the deploy goes to their home dir?
There was a problem hiding this comment.
It's a bit magicky? What permissions do we apply to the deployment entity on future "bundle deploys". Options are:
- Strictly the permissions block - in that case this is not auto migratable, since another user if they deploy the bundle can lock out the original user.
- Permissions block + the user from state_path. Yes, if we do this option then we can support auto migration.
Would be interesting to see how prevalent this case is. Let me add a third dedicated case for this.
| } | ||
|
|
||
| // A permissions section is set. stateFolderPerms is nil only when the state folder | ||
| // is not one of the synced bundle folders (e.g. it lives on a Volume), so its ACL |
There was a problem hiding this comment.
It can't live on a volume though. I think this is unreachable.
The return value is the same but the comment should reflect this.
Records whether each deploy is compatible with an automatic migration of the deployment state to a dedicated state storage service. Deploying a bundle requires write access (CAN_EDIT or higher) to the state folder; after migration that is governed by the permissions on the deployment object instead. When the bundle has no permissions section, the migration can mirror the state folder's ACLs onto the deployment (CAN_EDIT -> CAN_EDIT, CAN_MANAGE -> CAN_MANAGE), preserving everyone's access wherever the state lives. When a permissions section is set, the migration applies exactly those permissions, so anyone with write access to the state folder who is not declared with CAN_MANAGE would lose the ability to deploy. Exactly one verdict is recorded per deploy, evaluated by reusing the SetPermissions response from the existing workspace folder permission sync (no extra API call): - dms_compat_auto: no permissions section (folder ACLs are mirrored), or every principal with write access to the state folder is declared. - dms_compat_not: a permissions section is set and the state folder has write access that is not declared. For /Workspace/Shared this is known statically: all workspace users can write, so group_name: users CAN_MANAGE must be declared. The fake test server now models directory ACL inheritance (home-folder owners and ancestor folder grants appear as inherited entries), so the acceptance test exercises the full matrix via bundle paths, declared permissions, and real parent folder grants. Co-authored-by: Shreyas Goenka <shreyas.goenka@databricks.com>
Integration test reportCommit: fd02f96
490 interesting tests: 395 MISS, 59 FAIL, 21 RECOVERED, 7 KNOWN, 4 PANIC, 2 SKIP, 2 flaky
Top 50 slowest tests (at least 2 minutes):
|
Telemetry only. Records whether each deploy is compatible with an automatic migration of the deployment state to a dedicated state storage service. Deploying a bundle requires write access (CAN_EDIT or higher) to the state folder; after migration that is governed by the deployment object's permissions instead.
When there's no permissions section the migration mirrors the state folder's ACLs onto the deployment, so access is preserved wherever the state lives. When a permissions section is set, the migration applies exactly those permissions, so undeclared writers on the state folder would lose the ability to deploy.
Exactly one verdict per deploy (
bool_values), evaluated by reusing the existingSetPermissionsresponse (no extra API call):dms_compat_auto— no permissions section, or all write access on the state folder is declared.dms_compat_only_self_undeclared— a permissions section is set and the only undeclared writer is the deploying user. The deployment object grants the deployer CAN_MANAGE initially, so this is migratable if we choose to preserve that grant on future deploys. Recorded separately to measure how common it is.dms_compat_not— undeclared write access from a principal other than the deploying user (/Workspace/Sharedis treated as writable by all users, requiringusers: CAN_MANAGE).Plus
state_path_is_shared.Tested by an acceptance test over the case matrix (no permissions / declared / undeclared, home and shared state folders; the fake server models home-folder ownership as inherited CAN_MANAGE) and a table-driven unit test for
UndeclaredWriters.