Add support for ordinal formatting (and update CLDR) by sfuqua · Pull Request #52 · jeffijoe/messageformat.net

sfuqua · 2026-02-16T05:52:10Z

Fixes #51

This change adds support for the selectordinal function/formatter. Though I don't believe selectordinal is documented in the original ICU libraries, it seems to have ecosystem support and is a first class function in MF2, so it seems a good fit.

Thankfully the CLDR ordinals.xml has the same schema as plurals.xml so the parsing logic could be reused, though I had to update the data structures and generated code to support multiple plural "types".

Add ordinals.xml from CLDR v48.1 (and update plurals.xml simultaneously)
Add README to the data folder describing pedigree of the XML files
Add documentation to some of the codegen types and PluralContext to serve as touchstones
Update parsing infra to support two XML files and two plural "types" throughout the stack:
a. Collections of PluralRule are replaced with a new data structure PluralRuleSet that serves as an index of locales + plural type
b. There are now two different TryGetByLocale functions in the generated code, one for Cardinal (current behavior) and one for Ordinal (new/added behavior)
Per discussion in this thread, removed default English pluralizers, which fixes Built-in English pluralizer is inconsistent w/CLDR #53
Update existing tests to explicitly exercise cardinal formatting
Added some new ordinal test cases
Added rudimentary support for LMDL "inheritance" rules, such that i.e. "en-CA" will match CLDR "en", and unknown or unmapped locales will always change to "root"; removed the library's default mapping to "en" as a result (because "root" exists)

src/Jeffijoe.MessageFormat.MetadataGenerator/Plural/Parsing/PluralRuleSet.cs

...ijoe.MessageFormat.MetadataGenerator/Plural/SourceGeneration/PluralRulesMetadataGenerator.cs

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs

src/Jeffijoe.MessageFormat.MetadataGenerator/Plural/Parsing/AST/Condition.cs

src/Jeffijoe.MessageFormat.Tests/MetadataGenerator/GeneratedPluralRulesTests.cs

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralContext.cs

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralRuleKey.cs

sfuqua · 2026-02-16T17:12:30Z

@jeffijoe I'm happy to tackle #53 (and update the README) in the same change. I figured as-is, this change could be a semver minor version bump for a backcompat new feature, but if you're happy to call it a major version change I'll go ahead and make the 'zero' change as well (which would just leave the open question about the codegen change). I can also do it in a separate PR - lmk if you have a preference.

I can also update the codegen to be backwards compatible by keeping the _LANG helpers and adding a new overload of TryGetRuleByLocale for ordinals, but I was pretty sure the generator project and what it spits out isn't part of the versioned API surface.

jeffijoe · 2026-02-16T17:17:40Z

The codegen is not part of the public API, so if everything works without the extra generated code, then we should leave it out.

Happy to make it a major bump now - I'd want to also tackle #41 in the same release since that is also a breaking change. Basically all that is for is making the default culture InvariantCulture as well as allowing passing a CultureInfo per invocation.

sfuqua · 2026-02-16T20:24:05Z

I was cleaning up PluralRuleKey a bit and realized I need to check some behavior about how input is matched against supported locales to ensure that e.g. "fr-FR" matches "fr".

Going to double check master with some new tests to confirm I didn't inadvertently change any behavior around fallback, and also look into how straightforward it is to canonicalize an input locale to a Unicode language ID and use the LDML matching algorithm to pick a rule.

sfuqua · 2026-02-17T00:40:28Z

Okay, I discovered a few other things that I'm interested in trying to contribute but that may be outside the scope of this PR through a reading of https://www.unicode.org/reports/tr35/tr35.html#Locale_Inheritance

There's no language inheritance in the lib today, so "fr-FR" should match "fr", but does not (can confirm this in the existing tests by updating "ru" to "ru-RU"
The CLDR plural data has two examples of subtags ("pt_PT" and "kok_Latn") - per the spec, CLDR currently always uses "_" as the separator
Per the spec, -/_ are equivalent, so an input to the library of "pt-PT" should match "pt_PT"

The "-" and "_" separators are treated as equivalent, although "-" is preferred.

The lookup of locales should be case-insensitive (alternatively, the input should be canonicalized)
The LDML inheritance/matching algo should be implemented, such that:

Given a particular locale id "en_US_someVariant", the default search chain for a particular resource is the following.
en_US_someVariant
en_US
en
root

There is nuance in how to do search chaining for non-canonical names, however, given that almost all the plural data uses "base" language tags, I'm not sure there's value in going that far.

Notably, implementing support for root means every input will always map to a default Pluralizer based on CLDR rules, so we won't have to fallback to "en" anymore for missing languages.

I think my plan for this change is to -

Update the codegen to normalize CLDR _ to BCP47 - in the Dictionary - this will hopefully make "pt-PT" work more often by default, and maintain a separate index of tags that include underscores to facilitate conversion in step 4.
Restructure the codegen to use Dictionary<string, Dictionary<string, ContextPluralizer>> keyed on CLDR-locale, and then on type (or vice versa), with case-insensitive comparison for the locale Dictionary. I'll keep using a record struct for the user-facing API, but not as a Dictionary key.
Update documentation to clarify the semantics of the locale string in a couple places.
Call TryGetRuleByLocale successively up to the root language before failing. If we detect an underscore, we can reference the index in step 1 to remap to a support locale (e.g., "pt_PT" -> "pt-PT" -> rule).
Document the edge cases in README
New tests

I think step 2 (the Dictionary change) should happen regardless as it's directly related to the feature in this PR, but the rest of the fallback stuff could wait til a separate change. I'll probably start on it in parallel and let me know if you have a preference on logistics (or if this should be tracked with a new issue).

jeffijoe · 2026-02-17T01:17:13Z

Don't want to bog you down with logistics, but if you feel like it would be simpler for both of us to merge this PR first, we can do that, let me know when it's ready and I'll merge it. Appreciate the help!

sfuqua · 2026-02-17T06:57:22Z

Okay, this ended up a little silly but I landed a compromise for locale inheritance in a new LocaleHelper.cs.

We try the original locale as-is (now case insensitive!), then we try stripping all subtags (so en-US-Foo will try "en"), then we finally try "root".

This means -

No more fallback to English (instead relying on "root" CLDR rules) for unknown languages - this is technically another "breaking" change, though depending on this behavior would've been odd
Most valid BCP 47 tags or CultureInfo names passed into this library should now properly match a CLDR rule when they would've hit English before
Built-in English pluralizers have been removed (can be added back in manually), and tests have been updated

This should work very well for almost all pluralization cases except for "pt-PT" with additional subtags (which would fall back to "pt", which is equivalent to "pt-BR").

We can get away with this because almost all of the plural CLDR locales are base languages.

I did end up substantially refactoring my previous codegen to key on locale first, and made Cardinal/Ordinal strongly typed instead of passing "cardinal" and "ordinal" strings everywhere.

src/Jeffijoe.MessageFormat/Helpers/LocaleHelper.cs

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs

jeffijoe

Since we're already doing a major for the next release, we can rename Pluralizers to CardinalPluralizers for consistency.

src/Jeffijoe.MessageFormat/Formatting/Formatters/PluralFormatter.cs

src/Jeffijoe.MessageFormat/MessageFormatter.cs

src/Jeffijoe.MessageFormat.Tests/MetadataGenerator/GeneratedPluralRulesTests.cs

jeffijoe · 2026-02-18T10:06:13Z

Merged, thank you so much! Was there anything else you wanted to address before I cut the next major release?

sfuqua · 2026-02-18T17:10:50Z

Merged, thank you so much! Was there anything else you wanted to address before I cut the next major release?

Nothing blocking on my end! Please feel free to tag me if anything strange surfaces with the changes.

Longer term I'm interested in a C#/.NET solution for MessageFormat 2 which got ratified this year - it'd be a bit of a project but the package might actually be set up well for a dual parsing pipeline in the future leveraging all the same generated CLDR bindings.

The lib I believe you originally used as a reference is now one of the reference implementations of the new syntax 😁

jeffijoe · 2026-02-18T17:15:29Z

Yes, it's been a long time, I didn't realize there was a new spec. 😅

I'll probably take a stab at #41 before I cut a new release, but I can't promise when that will be.

EDIT: Took a quick look at MF2, that is definitely a big undertaking! But I like the syntax!

EDIT2: this is nuts

.input {$pronoun :string}
.input {$count :number}
.match $pronoun $count
he one   {{He has {$count} notification.}}
he *     {{He has {$count} notifications.}}
she one  {{She has {$count} notification.}}
she *    {{She has {$count} notifications.}}
* one    {{They have {$count} notification.}}
* *      {{They have {$count} notifications.}}

jeffijoe · 2026-02-23T22:00:43Z

This has now been released! Thanks again!

sfuqua added 12 commits October 17, 2025 23:18

Sync CLDR and add ordinals.xml + README

f2f23dd

Update CLDR to 48.1

a05bfe6

Add plurals xml to metadata csproj

b18aad8

xmldoc for AST types

f707fec

parsing basics

744e761

Add some documentation to LMDL types

d177f40

Codegen likely happy now

03a621c

Building and tests passing

c6fff54

Add new README test for selectordinal

eff20d1

Fix bad test

22c151b

readonly record struct

550fa53

More tests

fec0a92