Skip to content

Implement native rules for the predefined protovalidate rules#316

Open
jonbodner-buf wants to merge 14 commits intomainfrom
jonbodner/native-rules
Open

Implement native rules for the predefined protovalidate rules#316
jonbodner-buf wants to merge 14 commits intomainfrom
jonbodner/native-rules

Conversation

@jonbodner-buf
Copy link
Copy Markdown
Contributor

This PR provides native implementations for the following protovalidate rules:

  • (buf.validate.field).{numeric_type}.const

  • (buf.validate.field).{numeric_type}.lt

  • (buf.validate.field).{numeric_type}.lte

  • (buf.validate.field).{numeric_type}.gt

  • (buf.validate.field).{numeric_type}.gte

  • (buf.validate.field).{numeric_type}.in

  • (buf.validate.field).{numeric_type}.not_in

  • (buf.validate.field).{numeric_type}.finite

  • (buf.validate.field).bool.const

  • (buf.validate.field).string.const

  • (buf.validate.field).string.len

  • (buf.validate.field).string.min_len

  • (buf.validate.field).string.max_len

  • (buf.validate.field).string.len_bytes

  • (buf.validate.field).string.min_bytes

  • (buf.validate.field).string.max_bytes

  • (buf.validate.field).string.pattern

  • (buf.validate.field).string.prefix

  • (buf.validate.field).string.suffix

  • (buf.validate.field).string.contains

  • (buf.validate.field).string.not_contains

  • (buf.validate.field).string.in

  • (buf.validate.field).string.not_in

  • (buf.validate.field).string.email

  • (buf.validate.field).string.hostname

  • (buf.validate.field).string.ip

  • (buf.validate.field).string.ipv4

  • (buf.validate.field).string.ipv6

  • (buf.validate.field).string.uri

  • (buf.validate.field).string.uri_ref

  • (buf.validate.field).string.address

  • (buf.validate.field).string.uuid

  • (buf.validate.field).string.tuuid

  • (buf.validate.field).string.ip_with_prefixlen

  • (buf.validate.field).string.ipv4_with_prefixlen

  • (buf.validate.field).string.ipv6_with_prefixlen

  • (buf.validate.field).string.ip_prefix

  • (buf.validate.field).string.ipv4_prefix

  • (buf.validate.field).string.ipv6_prefix

  • (buf.validate.field).string.host_and_port

  • (buf.validate.field).string.ulid

  • (buf.validate.field).string.well_known_regex

  • (buf.validate.field).string.strict

  • (buf.validate.field).bytes.const

  • (buf.validate.field).bytes.len

  • (buf.validate.field).bytes.min_len

  • (buf.validate.field).bytes.max_len

  • (buf.validate.field).bytes.pattern

  • (buf.validate.field).bytes.prefix

  • (buf.validate.field).bytes.suffix

  • (buf.validate.field).bytes.contains

  • (buf.validate.field).bytes.in

  • (buf.validate.field).bytes.not_in

  • (buf.validate.field).bytes.ip

  • (buf.validate.field).bytes.ipv4

  • (buf.validate.field).bytes.ipv6

  • (buf.validate.field).bytes.uuid

  • (buf.validate.field).enum.const

  • (buf.validate.field).enum.defined_only

  • (buf.validate.field).enum.in

  • (buf.validate.field).enum.not_in

  • (buf.validate.field).repeated.min_items

  • (buf.validate.field).repeated.max_items

  • (buf.validate.field).repeated.unique

  • (buf.validate.field).repeated.items

  • (buf.validate.field).map.min_pairs

  • (buf.validate.field).map.max_pairs

  • (buf.validate.field).map.keys

  • (buf.validate.field).map.values

By default, the native rules are disabled. To use them, run with the environment variable PV_NATIVE_RULES set to true

Performance improvement from using native rules:

$ benchstat 2026-04-21:17:17:52.bench.txt 2026-04-21:17:20:31.bench.txt
goos: darwin
goarch: arm64
pkg: buf.build/go/protovalidate
cpu: Apple M1 Max
                          │ 2026-04-21:17:17:52.bench.txt │    2026-04-21:17:20:31.bench.txt    │
                          │            sec/op             │   sec/op     vs base                │
Scalar-10                                    167.35n ± 1%   70.97n ± 0%  -57.59% (p=0.000 n=10)
Repeated/Scalar-10                           285.60n ± 1%   99.10n ± 1%  -65.30% (p=0.000 n=10)
Repeated/Message-10                           691.3n ± 1%   296.7n ± 0%  -57.09% (p=0.000 n=10)
Repeated/Unique/Scalar-10                    1249.0n ± 0%   556.8n ± 0%  -55.42% (p=0.000 n=10)
Repeated/Unique/Bytes-10                     2514.5n ± 0%   967.0n ± 1%  -61.54% (p=0.000 n=10)
Map-10                                        289.9n ± 1%   103.2n ± 1%  -64.39% (p=0.000 n=10)
ComplexSchema-10                              40.93µ ± 0%   14.27µ ± 1%  -65.14% (p=0.000 n=10)
Int32GT-10                                   2748.0n ± 0%   845.6n ± 0%  -69.23% (p=0.000 n=10)
TestByteMatching-10                          1348.0n ± 0%   206.2n ± 0%  -84.70% (p=0.000 n=10)
Compile-10                                    8.262m ± 0%   1.357m ± 1%  -83.58% (p=0.000 n=10)
CompileInt32GT-10                             5.887m ± 1%   1.241m ± 0%  -78.91% (p=0.000 n=10)
geomean                                       5.738µ        1.755µ       -69.42%

                          │ 2026-04-21:17:17:52.bench.txt │       2026-04-21:17:20:31.bench.txt       │
                          │             B/op              │     B/op      vs base                     │
Scalar-10                                    0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
Repeated/Scalar-10                          120.00 ± 0%       48.00 ± 0%   -60.00% (p=0.000 n=10)
Repeated/Message-10                         120.00 ± 0%       48.00 ± 0%   -60.00% (p=0.000 n=10)
Repeated/Unique/Scalar-10                    536.0 ± 0%       272.0 ± 0%   -49.25% (p=0.000 n=10)
Repeated/Unique/Bytes-10                    1784.0 ± 0%       832.0 ± 0%   -53.36% (p=0.000 n=10)
Map-10                                      128.00 ± 0%       64.00 ± 0%   -50.00% (p=0.000 n=10)
ComplexSchema-10                          10.552Ki ± 0%     4.383Ki ± 0%   -58.46% (p=0.000 n=10)
Int32GT-10                                   0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10) ¹
TestByteMatching-10                          408.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Compile-10                                 6.811Mi ± 0%     1.671Mi ± 0%   -75.47% (p=0.000 n=10)
CompileInt32GT-10                          5.199Mi ± 0%     1.606Mi ± 0%   -69.11% (p=0.000 n=10)
geomean                                                 ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                          │ 2026-04-21:17:17:52.bench.txt │      2026-04-21:17:20:31.bench.txt       │
                          │           allocs/op           │  allocs/op   vs base                     │
Scalar-10                                    0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹
Repeated/Scalar-10                           3.000 ± 0%      1.000 ± 0%   -66.67% (p=0.000 n=10)
Repeated/Message-10                          3.000 ± 0%      1.000 ± 0%   -66.67% (p=0.000 n=10)
Repeated/Unique/Scalar-10                    34.00 ± 0%      14.00 ± 0%   -58.82% (p=0.000 n=10)
Repeated/Unique/Bytes-10                     73.00 ± 0%      24.00 ± 0%   -67.12% (p=0.000 n=10)
Map-10                                       2.000 ± 0%      1.000 ± 0%   -50.00% (p=0.000 n=10)
ComplexSchema-10                             419.0 ± 0%      131.0 ± 0%   -68.74% (p=0.000 n=10)
Int32GT-10                                   0.000 ± 0%      0.000 ± 0%         ~ (p=1.000 n=10) ¹
TestByteMatching-10                          17.00 ± 0%       0.00 ± 0%  -100.00% (p=0.000 n=10)
Compile-10                                  96.14k ± 0%     18.35k ± 0%   -80.91% (p=0.000 n=10)
CompileInt32GT-10                           64.56k ± 0%     17.53k ± 0%   -72.84% (p=0.000 n=10)
geomean                                                 ²                ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

@jonbodner-buf jonbodner-buf requested a review from rodaine April 21, 2026 22:35
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

The latest Buf updates on your PR. Results from workflow Buf / validate-protos (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed⏩ skippedApr 24, 2026, 10:06 PM

Comment thread cel/library.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say invert the relationship, moving the functions here into a package in internal, and having them consumed by this package. That way we don't leak these functions into the public API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. The validation rules and supporting code have been moved to internal/rules.

Comment thread builder.go Outdated
Comment on lines +469 to +472
// put behind a feature flag to allow for testing.
// it's easier to follow like this, don't break it up
//nolint:nestif
if f, _ := os.LookupEnv("PV_NATIVE_RULES"); strings.EqualFold(f, "true") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean CI isn't testing this for conformance?

In any case, we should not ship with this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will update the CI conformance tests to run both with and without the flag being set.

I'd like to keep a flag in place, even if it's inverted (set the flag to disable native rules). @timostamm and I had a conversation discussing this (he might weigh in also and correct anything that I get wrong from our discussion). We still need the rules in CEL (the native rules aren't ported to any other platform yet, and CEL is the only way to write a rule that combines or compares two fields). We also need a reference implementation for protovalidate. Having an easy way to test that the native rules and CEL rules produce the same result is essential, at least for now.

I look at this as being similar to the GODEBUG flags that the Go team has been using for backwards breaking changes outside the compatibility guarantee (https://go.dev/doc/godebug). Just in case a native rule causes a problem, it would be good to have a way to disable it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this bleeds up into everywhere the code is used, which means that someone would need to thread this through all the way to where they're deploying to have this get picked up. It's fine to have a way of flipping the switch, but this would be better as a build tag/flag, not an env var.

Copy link
Copy Markdown
Contributor Author

@jonbodner-buf jonbodner-buf Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll play with the build tag idea. I'd like to make it easy at test time to run against both native and CEL rules (see line 381 in dynamic_end_to_end_test.go).

Copy link
Copy Markdown
Contributor Author

@jonbodner-buf jonbodner-buf Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I have an implementation now. The default behavior is to enable the native rules. If the code is built with -tags="cel_rules", then the native code is not used. The conformance and test targets have been updated to always do both, and I've added a bench-cel to make it easy to run performance comparisons. There's also a way to switch between native and CEL rules during testing by setting a global unexported variable (there's a big comment on it).

Comment thread builder.go Outdated
Comment on lines +522 to +528
// When processWrapperRules unwraps a wrapper type (e.g., Int32Value),
// it calls buildValue with the inner scalar field descriptor but
// attaches the resulting evaluators to the outer value. At runtime,
// the evaluator receives the wrapper message, not the inner scalar,
// so native evaluators that call val.Int() would panic. The outer
// value's Descriptor still points to the wrapper message field, so
// we use its Kind to detect this case and fall back to CEL.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably get around this by composing the evaluator in an unwrapper that handles this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google wrapper well known types are now supported in native code. zero allocations in native code, over 58% faster.

Comment thread native_bool.go

// tryBuildNativeBoolRules attempts to build a native Go evaluator for
// bool rules. Returns nil if the rules can't be handled natively.
func tryBuildNativeBoolRules(base base, rules *validate.BoolRules) evaluator {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to the top (same for all native_* files) as it's the entrypoint for building up the evaluators.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread native_bytes.go Outdated
errNotUTF8 = errors.New("must be valid UTF-8 to apply regexp")
)

func (n nativeBytesEval) Evaluate(_ protoreflect.Message, val protoreflect.Value, _ *validationConfig) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we return early everywhere, this violates the existing default "fail open" behavior where we capture all violations, not just the first one. Either this needs to respect the failFast flag in validationConfig, or should be split into individual evaluators for each and let the rest of the machinery handle this correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have conformance tests (or other tests) to validate this behavior? I'll fix the behavior to match the spec, presumably in all of the native Evaluate implementations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the code so that we return multiple violations if there are multiple violations and failFast is not set.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have conformance tests (or other tests) to validate this behavior? I'll fix the behavior to match the spec, presumably in all of the native Evaluate implementations.

There may be a few conformance tests that expect more than one violation, but I could be wrong. We should add it, if not. It's toggle-able when setting up a validator in pv-go for folks who want it exit early to shave off some execution time (and memory).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was passing all of the conformance tests when it was swallowing multiple violations, so if there are conformance tests that expect multiple violations, they are not working correctly. I'll investigate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no conformance tests covering multiple standard rules on a field (or fail-fast). Should be trivial to add some, but for confidence that all rules are conformant, we’d need the generative approach for conformance tests we’ve talked about, Jon.

Comment thread native_map.go Outdated
}

func (n nativeMapEval) Evaluate(_ protoreflect.Message, val protoreflect.Value, _ *validationConfig) error {
size := uint64(val.Map().Len()) //nolint:gosec
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to turn it on in the golangcilint config, but we should include the reason with the nolint directive:

Suggested change
size := uint64(val.Map().Len()) //nolint:gosec
size := uint64(val.Map().Len()) //nolint:gosec // maps can never be large enough to overflow u64

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment explains that int can't be negative or out of uint64 range.

Comment thread native_map.go Outdated
Comment on lines +35 to +36
minPairs *uint64
maxPairs *uint64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of pointers forcing an allocation, initialize this to the extremes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to value types

Comment thread native_map.go Outdated
}

func (n nativeMapEval) Tautology() bool {
return false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tautalogical if min_pairs/max_pairs aren't set (or are at the extremes). Having this report the correct value allows for tree-shaking of noop evaluators (even if by construction it's not likely to show up).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be addressed now

Comment thread README.md Outdated
Comment on lines +68 to +72
### Experimental native support release
This release provides experimental native support for standard validation rule processing. It is disabled by default
and enabled by setting the environment variable `PV_NATIVE_RULES` to `true` at runtime.

Performance improvements on the included benchmarks:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. I think if we're shipping this and it satisfies the conformance tests, we shouldn't gate it behind an environment variable flag. If we really don't want this to apply by default, we should use go build flags, so it doesn't need to bleed into folks' deployments. That said, I think this is a low risk, high impact improvement that doesn't require flags at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to retain the capability to run the conformance suite on both paths. protovalidate-go with cel-go gives us the best correctness signal for upstream changes to validate.proto.

Copy link
Copy Markdown
Member

@rodaine rodaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative besides a build flag is to have a ValidatorOption to disable the native rules. That'll make your testing a bit cleaner as well.

Comment thread native_selection_test.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the native tests, I'd gate the files behind the build flag instead of fighting parallel tests. Since all these tests are in the same package, unless ran individually all tests will potentially see native enabled which isn't what we want. Then for the actual flag, you'll have two files, setting useNativeRules as a constant instead of a variable (and having to fight the linter everywhere).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into the ValidatorOption.

Comment thread disable_native_rules.go Outdated
Comment on lines +19 to +21
func init() {
useNativeRules = false
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would become

const useNativeRules = false

And you'd have a separate file with

//go:build !cel_rules

package protovalidate

const useNativeRules = true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally not a const so that I could switch it during tests.

Comment thread native_bool.go

// tryBuildNativeBoolRules attempts to build a native Go evaluator for
// bool rules. Returns nil if the rules can't be handled natively.
func tryBuildNativeBoolRules(base base, rules *validate.BoolRules) evaluator {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if we add a new rule to the standard set, they're immediately consumable by pv-go. If I added a (buf.validate.field).bool.foo = "bar" rule, will that rule get built or silently ignored?

Copy link
Copy Markdown
Contributor Author

@jonbodner-buf jonbodner-buf Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get silently ignored, but that's where a test would come in (presumably we'd add a conformance test to validate the new rule, and it would fail for native).

I have an idea on how to fail over to CEL if an unknown rule is used, but I haven't tried it out yet. However, adding the automatic failover would mean that we wouldn't have a great signal that we need to update the native rules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but right now it "just works" because there is no special treatment. We need to have these native rules apply at the same granularity (maybe based off the constraint ID) so that the replacements are more surgical. This will avoid a lot of extra boilerplate to.

Copy link
Copy Markdown
Contributor Author

@jonbodner-buf jonbodner-buf Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented fallthrough for rules that are implemented in CEL but not in native code. In builder.processStandardRules, a copy is made of the passed-in rules. When the rules copy is passed to a native implementation, any known validation has its value cleared. The rules copy is then used to build the CEL validators. This allows any validation rules that are unknown to the native code to be processed by the CEL engine. I've tested this by commenting out a single native validation. The CEL rule is substituted automatically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a performance impact at compile time, but it's still much faster than CEL-only rules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're running the tests independently, we don't need to do this sort of test, correct? Better to keep the internals a black box at this level imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found it a pretty useful way to develop and validate the rules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think using a ValidatorOption is the better choice. Will let you actually run things in parallel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ValidatorOption is now implemented.

…r or not native rules are used. Update tests to use it and allow them to run in parallel.
…tions for error generation (still room for more improvement). add fall-through support for any rules that aren't covered in native.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants