core::str::{chars_uppercase,chars_lowercase} iterators by markokr · Pull Request #98490 · rust-lang/rust

markokr · 2022-06-25T15:46:22Z

They are based on new UnicodeConverter + UnicodeIterator
internal API that supports context-sensitivity and char expansion.

API change proposal: rust-lang/libs-team#58

rust-highfive · 2022-06-25T15:46:25Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

rust-highfive · 2022-06-25T15:46:26Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @thomcc (or someone else) soon.

Please see the contribution instructions for more information.

thomcc · 2022-06-25T16:12:01Z

This is a new API, so please file an API change proposal (see the links in the bot post). That said, I'm very much in favor of this, and a quick skim (will need more thorough review once it gets API sign off) looks fine.

@rustbot label +T-libs-api -T-libs

markokr · 2022-06-25T16:49:09Z

Thanks for quick feedback!

Created API proposal here: rust-lang/libs-team#58

They are based on new UnicodeConverter + UnicodeIterator internal API that supports context-sensitivity and char expansion.

thomcc

Hmm, the implementation here is more generic than it needs to be, and more complex as a result. You should not do this via a trait IMO -- The way you've decoupled it here also requires more bounds-checks which shouldn't be needed, and avoiding them in this interface would require otherwise-unnecessary unsafe code (to be clear: I'm not asking for you to add unsafe to this).

I think chars_lowercase should be a bit more like what you'd get by manually performing s.chars().flat_map(|c| c.to_lowercase()). Uppercase is the complex one, but should be handled directly, closer to the implementation in liballoc.

Let me know if you need an example of what I mean.

thomcc · 2022-08-08T10:07:56Z

library/core/src/str/iter.rs

+///
+/// Default implementation is pass-through, no conversion is done,
+/// with `is_simple = is_ascii`.
+trait UnicodeConverter {


I think this shouldn't use a trait. I think this ends up complicating things quite a bit and makes the implementation overly generic.

thomcc · 2022-08-08T10:08:20Z

library/core/src/str/iter.rs

+    // data source
+    iter: CharIndices<'a>,
+    // buffer for .next()
+    fwd: [Option<char>; 2],


nit: forward/backward instead of fwd/bwd.

thomcc · 2022-08-08T10:19:08Z

library/alloc/src/lib.rs

 #![feature(unicode_internals)]
 #![feature(unsize)]
 #![feature(std_internals)]
+#![feature(unicode_converter)]


This feature name is too generic, perhaps something more like #![feature(str_chars_casemapped)]?

bors · 2022-08-17T00:19:11Z

☔ The latest upstream changes (presumably #100644) made this pull request unmergeable. Please resolve the merge conflicts.

JohnCSimon · 2022-10-02T21:14:41Z

Ping from triage:
@markokr what is the status of this PR?

JohnCSimon · 2023-01-01T18:46:22Z

@markokr
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you do please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jun 25, 2022

rust-highfive assigned thomcc Jun 25, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 25, 2022

markokr force-pushed the unicode-converter branch from 08b407c to 5e2185a Compare June 25, 2022 16:09

rustbot added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 25, 2022

This comment has been minimized.

Sign in to view

markokr force-pushed the unicode-converter branch from 5e2185a to 5a32cba Compare June 25, 2022 16:28

markokr mentioned this pull request Jun 25, 2022

Add new iterators: core::str::{chars_uppercase,chars_lowercase} rust-lang/libs-team#58

Closed

This comment has been minimized.

Sign in to view

markokr force-pushed the unicode-converter branch from 5a32cba to 38404ad Compare June 25, 2022 17:07

This comment has been minimized.

Sign in to view

core::str::{chars_uppercase,chars_lowercase} iterators

f5d701f

They are based on new UnicodeConverter + UnicodeIterator internal API that supports context-sensitivity and char expansion.

markokr force-pushed the unicode-converter branch from e1fbdd0 to f5d701f Compare June 26, 2022 20:06

djudd mentioned this pull request Jul 15, 2022

Correct handling of context-sensitive unicode case folding djudd/human-name#11

Open

thomcc added S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 22, 2022

thomcc requested changes Aug 8, 2022

View reviewed changes

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 17, 2022

JohnCSimon closed this Jan 1, 2023

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Jan 1, 2023

Uh oh!

Conversation

markokr commented Jun 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Jun 25, 2022

Uh oh!

rust-highfive commented Jun 25, 2022

Uh oh!

thomcc commented Jun 25, 2022

Uh oh!

This comment has been minimized.

markokr commented Jun 25, 2022

Uh oh!

This comment has been minimized.

This comment has been minimized.

thomcc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomcc Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

thomcc Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

thomcc Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

bors commented Aug 17, 2022

Uh oh!

JohnCSimon commented Oct 2, 2022

Uh oh!

JohnCSimon commented Jan 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

markokr commented Jun 25, 2022 •

edited

Loading

thomcc left a comment •

edited

Loading