add length_utf16 validator by DXist · Pull Request #245 · Keats/validator

DXist · 2023-03-20T16:33:31Z

This PR adds length_utf16 validator.

My project exposes data from Salesforce via JsonSchema based API. I want to validate field lengths in the same way as Salesforce does - by counting UTF16 characters.

UTF16 is used for Unicode string representation in JavaScript, Java and Salesforce APEX. I think this validator could be useful to others as well. A good use case is to align backend and frontend length validators.

An example of mismatch between UTF16 and Unicode codepoints: '𝔠' symbol has 2 UTF16 characters but it's still 1 Unicode codepoint.

Should I wrap the implementation in optional feature length_utf16 ?

Keats · 2023-03-20T19:27:47Z

I don't think it makes sense to add that to the library, it's better added as a custom validator.

LeoniePhiline · 2023-04-17T15:12:29Z

@Keats The need for an UTF-16 code unit length validator is very common - assume all of web form handling -, since the maxlength of HTML form fields counts UTF-16 code units.

If the frontend counts UTF-16 code units, and the backend counts UTF-8 code units, then inconsistencies arise whenever values contain characters encoded with different length in UTF-16 vs UTF-8.

This results in values being rejected by the server which passed client side validation, whenever the server's UTF-8 representation longer than the browser's UTF-16 representation.

LeoniePhiline · 2023-04-17T18:23:17Z

validator_types/src/lib.rs

            Validator::Regex(_) => "regex",
            Validator::Range { .. } => "range",
            Validator::Length { .. } => "length",
+            Validator::LengthUTF16 { .. } => "length_utf16",


Suggested change

Validator::LengthUTF16 { .. } => "length_utf16",

Validator::LengthUtf16 { .. } => "length_utf16",

For consistency with Rust code style, you might want to use Utf in identifiers.

E.g. https://doc.rust-lang.org/std/str/struct.EncodeUtf16.html

I'll address code style suggestions if the approach with an extra builtin validator type is desired for the crate users.

We could gather more comments/thumbs up in the MR description for more feedback.

Yeah I think it makes more sense to go with a parameter approach to the length validator like mentioned in #250 otherwise we just duplicate things that are 99% the same

add length_utf16 validator

91e9472

DXist mentioned this pull request Mar 20, 2023

Help with maintenance #201

Open

LeoniePhiline mentioned this pull request Apr 17, 2023

Feature request: Consider an UTF-16 code units length validator #250

Open

LeoniePhiline reviewed Apr 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add length_utf16 validator#245

add length_utf16 validator#245
DXist wants to merge 1 commit intoKeats:masterfrom
DXist:feature/utf16_length

DXist commented Mar 20, 2023 •

edited

Loading

Uh oh!

Keats commented Mar 20, 2023

Uh oh!

LeoniePhiline commented Apr 17, 2023

Uh oh!

LeoniePhiline Apr 17, 2023

Uh oh!

DXist Apr 18, 2023

Uh oh!

Keats Apr 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Validator::LengthUTF16 { .. } => "length_utf16",
	Validator::LengthUtf16 { .. } => "length_utf16",

Conversation

DXist commented Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Keats commented Mar 20, 2023

Uh oh!

LeoniePhiline commented Apr 17, 2023

Uh oh!

LeoniePhiline Apr 17, 2023

Choose a reason for hiding this comment

Uh oh!

DXist Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

Keats Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DXist commented Mar 20, 2023 •

edited

Loading