Skip to content

{Sys,Unix}.rename: on Windows, allow renaming onto read-only file#14602

Open
nojb wants to merge 12 commits intoocaml:trunkfrom
nojb:rename_read_only
Open

{Sys,Unix}.rename: on Windows, allow renaming onto read-only file#14602
nojb wants to merge 12 commits intoocaml:trunkfrom
nojb:rename_read_only

Conversation

@nojb
Copy link
Contributor

@nojb nojb commented Feb 27, 2026

According to POSIX, there is no issue renaming a file onto another file which is read-only. However, doing that on Windows currently fails.

This PR is an attempt to make the semantics on Windows closer to POSIX: if the destination file exists and is read-only, we remove the read-only bit before doing the rename to allow it to succeed. A test is included.

Of course, by doing this we make "rename" less atomic due to the extra operations, so I would be happy to hear if the experts think this is a good idea.

For context, it was my colleague @Nevor who discovered this issue while investigating why Dune cache "store" operations in our Windows CI machines were failing (Answer: Dune makes entries read-only and then tries to rename files over them to update them).

cc @jmid @dra27 @MisterDA

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, and/but it would look even nicer with a helper function.

inline bool check_file_attributes(int attribs, int attrib_mask, bool expected) {
  return ((attribs != INVALID_FILE_ATTRIBUTES) &&
          (!!(attribs & attrib_mask) == expected));
}

Example usage:

if (   check_file_attributes(old_attribs, FILE_ATTRIBUTE_DIRECTORY, true)
    && check_file_attributes(new_attribs, FILE_ATTRIBUTE_DIRECTORY, false))

@nojb
Copy link
Contributor Author

nojb commented Feb 28, 2026

it would look even nicer with a helper function.

Thanks for the suggestion. Amended.

@gasche
Copy link
Member

gasche commented Feb 28, 2026

Could you add a Changes entry? This is a user-visible bug fix.

@nojb
Copy link
Contributor Author

nojb commented Mar 2, 2026

Could you add a Changes entry? This is a user-visible bug fix.

I will do that. I would still like to receive confirmation from a Windows expert, since it is easy to introduce bugs when using these APIs.

@nojb nojb removed the merge-me label Mar 2, 2026
Copy link
Contributor

@MisterDA MisterDA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@nojb
Copy link
Contributor Author

nojb commented Mar 2, 2026

lgtm, thanks!

@MisterDA: thanks for the confirmation!

Could you add a Changes entry? This is a user-visible bug fix.

I will do that.

@gasche: I put the Changes entry as a breaking change under "Standard Library": it felt a bit overkill to call it a "bug fix" since there is no specification of what Sys.rename/Unix.rename should do under Windows. Do you agree?

@gasche
Copy link
Member

gasche commented Mar 2, 2026

I would call it a "bug fix" because it fixes a bug with Dune that is caused by the stdlib behavior on Windows.

Nevor added a commit to Nevor/dune that referenced this pull request Mar 2, 2026
…deduplicating Dune cache

In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory.

The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue.

Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602.

The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1].

[1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.
Nevor added a commit to Nevor/dune that referenced this pull request Mar 2, 2026
…deduplicating Dune cache

In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory.

The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue.

Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602.

The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1].

[1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.
Nevor added a commit to Nevor/dune that referenced this pull request Mar 2, 2026
…deduplicating Dune cache

In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory.

The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue.

Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602.

The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1].

[1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.

Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
Nevor added a commit to Nevor/dune that referenced this pull request Mar 2, 2026
…deduplicating Dune cache

In the Dune cache, when a temp artifact already exists in the cache, Dune attempts deduplication by deleting the temp artifact, linking it to the file already in the cache, and then moving the temp artifact to the build directory.

The last step fails on Windows because Dune sets a read-only attribute on the destination file, causing Unix.rename to fail with EACCES in this case. Since the operation fails, Dune does not consider the file cached. Combined with the fact that this seems to happen quite often [1] on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute on the destination file before trying to overwrite it (similar to the existing `win32_unlink`) and use it in the deduplication logic of the cache, fixing the issue.

Note: This issue does not exist on Unix. This discrepancy is also being addressed directly on OCaml's side: ocaml/ocaml#14602.

The test dedup.t already exhibits the issue on Windows. The `dune_cmd stat hardlinks _build/default/target` command returns 1 instead of the expected 3. This PR bumps the result to 2. To make the test completely pass, another issue around `Io.portable_hardlink` on Windows still needs to be addressed [1].

[1] It seems that the fact that on windows dune always copies files instead of trying to create a hardlink is linked to the fact that this deduplication happen more often on windows.

Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
@nojb
Copy link
Contributor Author

nojb commented Mar 2, 2026

I would call it a "bug fix" because it fixes a bug with Dune that is caused by the stdlib behavior on Windows.

For future reference: @Nevor submitted a separate fix for Dune independent of the change here: ocaml/dune#13713.

nojb pushed a commit to ocaml/dune that referenced this pull request Mar 2, 2026
…deduplicating Dune cache (#13713)

In the Dune cache, when a temp artifact already exists in the cache,
Dune attempts deduplication by deleting the temp artifact, linking it to
the file already in the cache, and then moving the temp artifact to the
build directory.

The last step fails on Windows because Dune sets a read-only attribute
on the destination file, causing `Unix.rename` to fail with `EACCES` in
this case. Since the operation fails, Dune does not consider the file
cached. Combined with the fact that this seems to happen quite often [1]
on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute
on the destination file before trying to overwrite it (similar to the
existing `win32_unlink`) and use it in the deduplication logic of the
cache, fixing the issue.

_Note: This issue does not exist on Unix. This discrepancy is also being
addressed directly on OCaml's side:
ocaml/ocaml#14602

The test dedup.t already exhibits the issue on Windows. The `dune_cmd
stat hardlinks _build/default/target` command returns 1 instead of the
expected 3. This PR bumps the result to 2. To make the test completely
pass, another issue around `Io.portable_hardlink` on Windows still needs
to be addressed [1].

[1] It seems that the fact that on windows dune always copies files
instead of trying to create a hardlink is linked to the fact that this
deduplication happen more often on windows. See #13714.

---------

Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
remove read-only bit before trying to rename */
if (check_attr(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE)) {
SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point where I am not sure: if MoveFileEx (which does the actual rename) fails below for any reason, we will leave behind us an observable side-effect of having removed the read-only bit.

One approach would be to reset the read-only bit if we removed it and the rename failed. Also, we could only try removing the read-only bit if an initial rename fails with a permission error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of resetting the read-only bit if move fails. My gut feeling is that catching permission errors is more fragile (I don't know how Windows security protections work, but I wouldn't be surprised if permission errors were observable in various ways by the user.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something in this direction in cd930de

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make the normal path as atomic as possible - why not try MoveFileEx and if GetLastError = 5 (invalid access - I can't remember the constant) then check the read-only attribute and try MoveFileEx again? FWIW, if that second call fails, I'd not bother re-setting the attribute

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry, that’s daft, given that we have to read the attributes anyway… I agree with what’s here already (but I still don't like the check_attr, FWIW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry, that’s daft, given that we have to read the attributes anyway…

Actually in the code before this PR, we only read the attributes of the destination if the source exists and is a directory, so in a sense we ARE making the normal path a bit less atomic the way it is written now, since we read the attributes of the destination unconditionally.

I'm ready to move the logic after an initial attempt with MoveFileEx if you still think that is a good idea. Do you confirm?

I still don't like the check_attr

Would a different name such as file_exists_and_has_attr work better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'd missed that subtlety - in which case, yes, I would go with the initial attempt to MoveFileEx and then look at the attributes (it does have the benefit that were Microsoft ever to change the semantics of the flag - which isn't totally impossible - then it would just naturally work).

Would a different name such as file_exists_and_has_attr work better?

I guess (if we must have an auxiliary 🫣) - it's probably just me, the call site (ignoring the name completely) _(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE) just doesn't look any clearer than the original expression. However, I'm most certainly not blocking anything over it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'd missed that subtlety - in which case, yes, I would go with the initial attempt to MoveFileEx and then look at the attributes (it does have the benefit that were Microsoft ever to change the semantics of the flag - which isn't totally impossible - then it would just naturally work).

Would a different name such as file_exists_and_has_attr work better?

I guess (if we must have an auxiliary 🫣) - it's probably just me, the call site (ignoring the name completely) _(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE) just doesn't look any clearer than the original expression. However, I'm most certainly not blocking anything over it!

I'm willing to defer to what your experience tells you :)

I moved the logic after the initial MoveFileEx in 2eaecb6 (NB: I do not try to put back the read-only bit if the follow-on MoveFileEx fails). And I reverted the introduction of the helper function (sorry @gasche!) in ccbd94a.

Nevor added a commit to Nevor/dune that referenced this pull request Mar 3, 2026
…deduplicating Dune cache (ocaml#13713)

In the Dune cache, when a temp artifact already exists in the cache,
Dune attempts deduplication by deleting the temp artifact, linking it to
the file already in the cache, and then moving the temp artifact to the
build directory.

The last step fails on Windows because Dune sets a read-only attribute
on the destination file, causing `Unix.rename` to fail with `EACCES` in
this case. Since the operation fails, Dune does not consider the file
cached. Combined with the fact that this seems to happen quite often [1]
on Windows, this effectively nullifies the cache speedup.

This PR creates a helper function that removes the read-only attribute
on the destination file before trying to overwrite it (similar to the
existing `win32_unlink`) and use it in the deduplication logic of the
cache, fixing the issue.

_Note: This issue does not exist on Unix. This discrepancy is also being
addressed directly on OCaml's side:
ocaml/ocaml#14602

The test dedup.t already exhibits the issue on Windows. The `dune_cmd
stat hardlinks _build/default/target` command returns 1 instead of the
expected 3. This PR bumps the result to 2. To make the test completely
pass, another issue around `Io.portable_hardlink` on Windows still needs
to be addressed [1].

[1] It seems that the fact that on windows dune always copies files
instead of trying to create a hardlink is linked to the fact that this
deduplication happen more often on windows. See ocaml#13714.

---------

Signed-off-by: Roven Lamar Gabriel <nevor@nevor.net>
runtime/win32.c Outdated
Comment on lines +777 to +783
Caml_inline BOOL check_attr(DWORD attrs, DWORD mask, BOOL expected)
{
return
(attrs != INVALID_FILE_ATTRIBUTES) &&
(((attrs & mask) ? TRUE : FALSE) == expected);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use stdbool.h here (and be less shouty 😁)? That said, somewhat pedantically, I'm not convinced by this refactoring. A key part of the conditional where it's used below is that both the source and destination entries exist, which is idiomatically (and idiosyncratically) there in the comparison with != INVALID_FILE_ATTRIBUTES but is not to me encapsulated in the name check_attr?

Copy link
Contributor Author

@nojb nojb Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use stdbool.h here (and be less shouty 😁)?

We can. The rest of the file uses BOOL and friends, which is why I used them here as well. I offer to switch all of them in a follow-up PR, sounds good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I went ahead and did the specific change in 5ecfe59.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most/all of the others are actual Win32 API ones (with some painful memories that BOOL and BOOLEAN aren't always the same thing!) - but in this case it's just an actual C boolean expression, so I thought we could be modern (@MisterDA's positive influence!)

remove read-only bit before trying to rename */
if (check_attr(new_attribs, FILE_ATTRIBUTE_READONLY, TRUE)) {
SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make the normal path as atomic as possible - why not try MoveFileEx and if GetLastError = 5 (invalid access - I can't remember the constant) then check the read-only attribute and try MoveFileEx again? FWIW, if that second call fails, I'd not bother re-setting the attribute

@jmid
Copy link
Member

jmid commented Mar 3, 2026

Nice find! The past Windows issues were found by this model-based test, originally written by @Lucccyo:
https://github.com/ocaml-multicore/multicoretests/blob/af024837cf974d6a4bd60bed656d71817a0d0c48/src/sys/stm_tests.ml#L208-L222

  1. In a switch with dune and qcheck-core it can potentially be run as an extra confirmation
  2. It could be a fun exercise extending the randomization to the permissions bits in mkfile and Mkdir too... 🤓

if ((new_attribs != INVALID_FILE_ATTRIBUTES) &&
(new_attribs & FILE_ATTRIBUTE_READONLY) != 0) {
/* Remove read-only bit before trying to rename */
SetFileAttributes(newpath, new_attribs & ~FILE_ATTRIBUTE_READONLY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to ignore the result of SetFileAttributes here and be sure that GetLastError relates to the MoveFileEx call, but I think that's worth a comment.

It also occurs to me... that directories can also have the read-only attribute set 🤦I've not had enough coffee to work out the implications of the error path w.r.t. the corner-cases for directories and that attribute, but we should probably convince ourselves that the "semantics" are still OK there too (it'd be so great if that DOS attribute would just die.......!)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well you can check for new_attribs & FILE_ATTRIBUTE_DIRECTORY and do not attempt to do anything in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants