Add support for ordinal formatting (and update CLDR)#52
Add support for ordinal formatting (and update CLDR)#52jeffijoe merged 18 commits intojeffijoe:masterfrom
Conversation
src/Jeffijoe.MessageFormat.MetadataGenerator/Plural/Parsing/PluralRuleSet.cs
Outdated
Show resolved
Hide resolved
...ijoe.MessageFormat.MetadataGenerator/Plural/SourceGeneration/PluralRulesMetadataGenerator.cs
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat.MetadataGenerator/Plural/Parsing/AST/Condition.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat.Tests/MetadataGenerator/GeneratedPluralRulesTests.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralContext.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralRuleKey.cs
Outdated
Show resolved
Hide resolved
|
@jeffijoe I'm happy to tackle #53 (and update the README) in the same change. I figured as-is, this change could be a semver minor version bump for a backcompat new feature, but if you're happy to call it a major version change I'll go ahead and make the 'zero' change as well (which would just leave the open question about the codegen change). I can also do it in a separate PR - lmk if you have a preference. I can also update the codegen to be backwards compatible by keeping the _LANG helpers and adding a new overload of |
|
The codegen is not part of the public API, so if everything works without the extra generated code, then we should leave it out. Happy to make it a major bump now - I'd want to also tackle #41 in the same release since that is also a breaking change. Basically all that is for is making the default culture |
|
I was cleaning up Going to double check master with some new tests to confirm I didn't inadvertently change any behavior around fallback, and also look into how straightforward it is to canonicalize an input locale to a Unicode language ID and use the LDML matching algorithm to pick a rule. |
|
Okay, I discovered a few other things that I'm interested in trying to contribute but that may be outside the scope of this PR through a reading of https://www.unicode.org/reports/tr35/tr35.html#Locale_Inheritance
There is nuance in how to do search chaining for non-canonical names, however, given that almost all the plural data uses "base" language tags, I'm not sure there's value in going that far. Notably, implementing support for I think my plan for this change is to -
I think step 2 (the Dictionary change) should happen regardless as it's directly related to the feature in this PR, but the rest of the fallback stuff could wait til a separate change. I'll probably start on it in parallel and let me know if you have a preference on logistics (or if this should be tracked with a new issue). |
|
Don't want to bog you down with logistics, but if you feel like it would be simpler for both of us to merge this PR first, we can do that, let me know when it's ready and I'll merge it. Appreciate the help! |
|
Okay, this ended up a little silly but I landed a compromise for locale inheritance in a new We try the original locale as-is (now case insensitive!), then we try stripping all subtags (so en-US-Foo will try "en"), then we finally try "root". This means -
This should work very well for almost all pluralization cases except for "pt-PT" with additional subtags (which would fall back to "pt", which is equivalent to "pt-BR"). We can get away with this because almost all of the plural CLDR locales are base languages. I did end up substantially refactoring my previous codegen to key on locale first, and made Cardinal/Ordinal strongly typed instead of passing "cardinal" and "ordinal" strings everywhere. |
jeffijoe
left a comment
There was a problem hiding this comment.
Since we're already doing a major for the next release, we can rename Pluralizers to CardinalPluralizers for consistency.
src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs
Outdated
Show resolved
Hide resolved
src/Jeffijoe.MessageFormat.Tests/MetadataGenerator/GeneratedPluralRulesTests.cs
Outdated
Show resolved
Hide resolved
|
Merged, thank you so much! Was there anything else you wanted to address before I cut the next major release? |
Nothing blocking on my end! Please feel free to tag me if anything strange surfaces with the changes. Longer term I'm interested in a C#/.NET solution for MessageFormat 2 which got ratified this year - it'd be a bit of a project but the package might actually be set up well for a dual parsing pipeline in the future leveraging all the same generated CLDR bindings. The lib I believe you originally used as a reference is now one of the reference implementations of the new syntax 😁 |
|
Yes, it's been a long time, I didn't realize there was a new spec. 😅 I'll probably take a stab at #41 before I cut a new release, but I can't promise when that will be. EDIT: Took a quick look at MF2, that is definitely a big undertaking! But I like the syntax! EDIT2: this is nuts |
|
This has now been released! Thanks again! |
Fixes #51
This change adds support for the
selectordinalfunction/formatter. Though I don't believeselectordinalis documented in the original ICU libraries, it seems to have ecosystem support and is a first class function in MF2, so it seems a good fit.Thankfully the CLDR
ordinals.xmlhas the same schema asplurals.xmlso the parsing logic could be reused, though I had to update the data structures and generated code to support multiple plural "types".ordinals.xmlfrom CLDR v48.1 (and updateplurals.xmlsimultaneously)PluralContextto serve as touchstonesa. Collections of
PluralRuleare replaced with a new data structurePluralRuleSetthat serves as an index of locales + plural typeb. There are now two different TryGetByLocale functions in the generated code, one for Cardinal (current behavior) and one for Ordinal (new/added behavior)