Fix set semantics for Common and Inherited#21
Closed
dcsommer wants to merge 1 commit intounicode-rs:masterfrom
Closed
Fix set semantics for Common and Inherited#21dcsommer wants to merge 1 commit intounicode-rs:masterfrom
dcsommer wants to merge 1 commit intounicode-rs:masterfrom
Conversation
Member
|
Yeah, I think that that behavior is a bug. |
Contributor
Author
|
Thanks. If you think the test looks right, I'll go ahead and make it pass and update the PR. |
…ited/Common After finding the iterator after a union with Common only yielded a single element, I overhauled the representation and semantics of `ScriptExtension`. This is a breaking change for most APIs. Summary of improvements to `ScriptExtension`: * Improved representation to be able to track multiple scripts as well as Inherited/Common * "Inherited" and "Common" no longer intersect with everything and have no subset/superset relationship between them. * `for_str` is a union, not intersection, of all chars * Added `is_subset_or_equal()` for easier comparison of unions and intersections * Changed `Debug` impl to a vanilla derive to allow comparing hex bits * Fixed `Display` impl to properly show each script, separated by pluses * New test for iterator
Contributor
Author
|
@Manishearth it ended up being a breaking change for several APIs, but I think the behavior is much easier to reason about now and more useful overall. LMK what you think when you have a moment. |
Manishearth
requested changes
Aug 13, 2025
Member
Manishearth
left a comment
There was a problem hiding this comment.
No, this is incorrect: Common and Inherited are not allowed to coexist with other scx values
That behavior is deliberate
https://www.unicode.org/reports/tr24/#Scx_Bad_Example_Table
I misunderstood the point of the test, sorry.
Contributor
Author
|
Got it, will close this PR then. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description: On reviewing the source code, it appears that the iterator implementation for
ScriptExtensiononly ever yields a single element. I did not see this documented, and assuming this is a bug, I have written a test case to clarify the expected subset, union, and iterator behavior. If maintainers believe this behavior as shown in the proposed test is correct, I can go ahead and implement the fixes in the rest of the library.