Create UTF-8 version of OsStr/OsString#147932
Conversation
|
r? @ibraheemdev rustbot has assigned @ibraheemdev. Use |
There was a problem hiding this comment.
Looks pretty good to me, just a few small suggestions.
r? tgross35
@rustbot author
|
Reminder, once the PR becomes ready for a review, use |
087647c to
7e2b76e
Compare
|
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
Unless WASI and/or Redox OS platform maintainers speak up here, I am creating separate PRs for those platforms for more focused attention. |
|
Thanks! @bors r+ rollup |
Rollup of 7 pull requests Successful merges: - #141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - #142339 (Add NonNull pattern types) - #147768 (Code refactoring on hir report_no_match_method_error) - #147788 (const Cell methods) - #147932 (Create UTF-8 version of `OsStr`/`OsString`) - #147933 (os_str: Make platform docs more consistent) - #147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang#142339 (Add NonNull pattern types) - rust-lang#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang#147788 (const Cell methods) - rust-lang#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang#147933 (os_str: Make platform docs more consistent) - rust-lang#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang#142339 (Add NonNull pattern types) - rust-lang#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang#147788 (const Cell methods) - rust-lang#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang#147933 (os_str: Make platform docs more consistent) - rust-lang#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Implement a UTF-8 version of
OsStr/OsString, in addition to the existing bytes and WTF-8 platform-dependent encodings.This is applicable for several platforms, but I've currently only implemented it for Motor OS:
OsStrExt/OsStringExttraits.OsStrExt/OsStringExt.OsStrExt/OsStringExttraits, but they're still internally bytes like Unix.This is an alternate approach to #147797, which reuses the existing bytes
OsStringand relies on the safety properties offrom_encoded_bytes_unchecked. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity.Note that Motor OS currently does not build until #147930 merges.
cc @tgross35 (for earlier review)
cc @alexcrichton, @rylev, @loganek (for WASI)
cc @lasiotus (for Motor OS)
cc @jackpot51 (for Redox OS)