{Sys,Unix}.rename: on Windows, allow renaming onto read-only file#14602
{Sys,Unix}.rename: on Windows, allow renaming onto read-only file#14602nojb wants to merge 12 commits intoocaml:trunkfrom
Conversation
gasche
left a comment
There was a problem hiding this comment.
This looks fine, and/but it would look even nicer with a helper function.
inline bool check_file_attributes(int attribs, int attrib_mask, bool expected) {
return ((attribs != INVALID_FILE_ATTRIBUTES) &&
(!!(attribs & attrib_mask) == expected));
}Example usage:
if ( check_file_attributes(old_attribs, FILE_ATTRIBUTE_DIRECTORY, true)
&& check_file_attributes(new_attribs, FILE_ATTRIBUTE_DIRECTORY, false))
Thanks for the suggestion. Amended. |
|
Could you add a Changes entry? This is a user-visible bug fix. |
I will do that. I would still like to receive confirmation from a Windows expert, since it is easy to introduce bugs when using these APIs. |
@MisterDA: thanks for the confirmation!
@gasche: I put the Changes entry as a breaking change under "Standard Library": it felt a bit overkill to call it a "bug fix" since there is no specification of what |
|
I would call it a "bug fix" because it fixes a bug with Dune that is caused by the stdlib behavior on Windows. |
…deduplicating Dune cache In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602. The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.
…deduplicating Dune cache In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602. The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.
…deduplicating Dune cache In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602. The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows. Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
…deduplicating Dune cache In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602. The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows. Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
For future reference: @Nevor submitted a separate fix for Dune independent of the change here: ocaml/dune#13713. |
…deduplicating Dune cache (#13713) In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing `Unix.rename` to fail with `EACCES` in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. _Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602 The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows. See #13714. --------- Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
| remove read-only bit before trying to rename */ | ||
| if (check_attr(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE)) { | ||
| SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY); | ||
| } |
There was a problem hiding this comment.
One point where I am not sure: if MoveFileEx (which does the actual rename) fails below for any reason, we will leave behind us an observable side-effect of having removed the read-only bit.
One approach would be to reset the read-only bit if we removed it and the rename failed. Also, we could only try removing the read-only bit if an initial rename fails with a permission error.
There was a problem hiding this comment.
I would be in favor of resetting the read-only bit if move fails. My gut feeling is that catching permission errors is more fragile (I don't know how Windows security protections work, but I wouldn't be surprised if permission errors were observable in various ways by the user.)
There was a problem hiding this comment.
I did something in this direction in cd930de
There was a problem hiding this comment.
I think it's better to make the normal path as atomic as possible - why not try MoveFileEx and if GetLastError = 5 (invalid access - I can't remember the constant) then check the read-only attribute and try MoveFileEx again? FWIW, if that second call fails, I'd not bother re-setting the attribute
There was a problem hiding this comment.
Actually, sorry, that’s daft, given that we have to read the attributes anyway… I agree with what’s here already (but I still don't like the check_attr, FWIW)
There was a problem hiding this comment.
Actually, sorry, that’s daft, given that we have to read the attributes anyway…
Actually in the code before this PR, we only read the attributes of the destination if the source exists and is a directory, so in a sense we ARE making the normal path a bit less atomic the way it is written now, since we read the attributes of the destination unconditionally.
I'm ready to move the logic after an initial attempt with MoveFileEx if you still think that is a good idea. Do you confirm?
I still don't like the check_attr
Would a different name such as file_exists_and_has_attr work better?
There was a problem hiding this comment.
Oh, I'd missed that subtlety - in which case, yes, I would go with the initial attempt to MoveFileEx and then look at the attributes (it does have the benefit that were Microsoft ever to change the semantics of the flag - which isn't totally impossible - then it would just naturally work).
Would a different name such as
file_exists_and_has_attrwork better?
I guess (if we must have an auxiliary 🫣) - it's probably just me, the call site (ignoring the name completely) _(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE) just doesn't look any clearer than the original expression. However, I'm most certainly not blocking anything over it!
There was a problem hiding this comment.
Oh, I'd missed that subtlety - in which case, yes, I would go with the initial attempt to
MoveFileExand then look at the attributes (it does have the benefit that were Microsoft ever to change the semantics of the flag - which isn't totally impossible - then it would just naturally work).Would a different name such as
file_exists_and_has_attrwork better?I guess (if we must have an auxiliary 🫣) - it's probably just me, the call site (ignoring the name completely)
_(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE)just doesn't look any clearer than the original expression. However, I'm most certainly not blocking anything over it!
I'm willing to defer to what your experience tells you :)
I moved the logic after the initial MoveFileEx in 2eaecb6 (NB: I do not try to put back the read-only bit if the follow-on MoveFileEx fails). And I reverted the introduction of the helper function (sorry @gasche!) in ccbd94a.
…deduplicating Dune cache (ocaml#13713) In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory. The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing `Unix.rename` to fail with `EACCES` in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup. This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue. _Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602 The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1]. [1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows. See ocaml#13714. --------- Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
runtime/win32.c
Outdated
| Caml_inline BOOL check_attr(DWORD attrs, DWORD mask, BOOL expected) | ||
| { | ||
| return | ||
| (attrs != INVALID_FILE_ATTRIBUTES) && | ||
| (((attrs & mask) ? TRUE : FALSE) == expected); | ||
| } | ||
|
|
There was a problem hiding this comment.
Can't we use stdbool.h here (and be less shouty 😁)? That said, somewhat pedantically, I'm not convinced by this refactoring. A key part of the conditional where it's used below is that both the source and destination entries exist, which is idiomatically (and idiosyncratically) there in the comparison with != INVALID_FILE_ATTRIBUTES but is not to me encapsulated in the name check_attr?
There was a problem hiding this comment.
Can't we use stdbool.h here (and be less shouty 😁)?
We can. The rest of the file uses BOOL and friends, which is why I used them here as well. I offer to switch all of them in a follow-up PR, sounds good?
There was a problem hiding this comment.
Actually, I went ahead and did the specific change in 5ecfe59.
There was a problem hiding this comment.
I think most/all of the others are actual Win32 API ones (with some painful memories that BOOL and BOOLEAN aren't always the same thing!) - but in this case it's just an actual C boolean expression, so I thought we could be modern (@MisterDA's positive influence!)
| remove read-only bit before trying to rename */ | ||
| if (check_attr(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE)) { | ||
| SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY); | ||
| } |
There was a problem hiding this comment.
I think it's better to make the normal path as atomic as possible - why not try MoveFileEx and if GetLastError = 5 (invalid access - I can't remember the constant) then check the read-only attribute and try MoveFileEx again? FWIW, if that second call fails, I'd not bother re-setting the attribute
|
Nice find! The past Windows issues were found by this model-based test, originally written by @Lucccyo:
|
| if ((new_attribs != INVALID_FILE_ATTRIBUTES) && | ||
| (new_attribs & FILE_ATTRIBUTE_READONLY) != 0) { | ||
| /* Remove read-only bit before trying to rename */ | ||
| SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY); |
There was a problem hiding this comment.
I think it's OK to ignore the result of SetFileAttributes here and be sure that GetLastError relates to the MoveFileEx call, but I think that's worth a comment.
It also occurs to me... that directories can also have the read-only attribute set 🤦I've not had enough coffee to work out the implications of the error path w.r.t. the corner-cases for directories and that attribute, but we should probably convince ourselves that the "semantics" are still OK there too (it'd be so great if that DOS attribute would just die.......!)
There was a problem hiding this comment.
Well you can check for new_attribs & FILE_ATTRIBUTE_DIRECTORY and do not attempt to do anything in this case.
According to POSIX, there is no issue renaming a file onto another file which is read-only. However, doing that on Windows currently fails.
This PR is an attempt to make the semantics on Windows closer to POSIX: if the destination file exists and is read-only, we remove the read-only bit before doing the rename to allow it to succeed. A test is included.
Of course, by doing this we make "rename" less atomic due to the extra operations, so I would be happy to hear if the experts think this is a good idea.
For context, it was my colleague @Nevor who discovered this issue while investigating why Dune cache "store" operations in our Windows CI machines were failing (Answer: Dune makes entries read-only and then tries to rename files over them to update them).
cc @jmid @dra27 @MisterDA