Implement native rules for the predefined protovalidate rules#316
Implement native rules for the predefined protovalidate rules#316jonbodner-buf wants to merge 14 commits intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / validate-protos (pull_request).
|
There was a problem hiding this comment.
I'd say invert the relationship, moving the functions here into a package in internal, and having them consumed by this package. That way we don't leak these functions into the public API.
There was a problem hiding this comment.
done. The validation rules and supporting code have been moved to internal/rules.
| // put behind a feature flag to allow for testing. | ||
| // it's easier to follow like this, don't break it up | ||
| //nolint:nestif | ||
| if f, _ := os.LookupEnv("PV_NATIVE_RULES"); strings.EqualFold(f, "true") { |
There was a problem hiding this comment.
Does this mean CI isn't testing this for conformance?
In any case, we should not ship with this.
There was a problem hiding this comment.
Good catch. I will update the CI conformance tests to run both with and without the flag being set.
I'd like to keep a flag in place, even if it's inverted (set the flag to disable native rules). @timostamm and I had a conversation discussing this (he might weigh in also and correct anything that I get wrong from our discussion). We still need the rules in CEL (the native rules aren't ported to any other platform yet, and CEL is the only way to write a rule that combines or compares two fields). We also need a reference implementation for protovalidate. Having an easy way to test that the native rules and CEL rules produce the same result is essential, at least for now.
I look at this as being similar to the GODEBUG flags that the Go team has been using for backwards breaking changes outside the compatibility guarantee (https://go.dev/doc/godebug). Just in case a native rule causes a problem, it would be good to have a way to disable it.
There was a problem hiding this comment.
The problem is this bleeds up into everywhere the code is used, which means that someone would need to thread this through all the way to where they're deploying to have this get picked up. It's fine to have a way of flipping the switch, but this would be better as a build tag/flag, not an env var.
There was a problem hiding this comment.
I'll play with the build tag idea. I'd like to make it easy at test time to run against both native and CEL rules (see line 381 in dynamic_end_to_end_test.go).
There was a problem hiding this comment.
ok, I have an implementation now. The default behavior is to enable the native rules. If the code is built with -tags="cel_rules", then the native code is not used. The conformance and test targets have been updated to always do both, and I've added a bench-cel to make it easy to run performance comparisons. There's also a way to switch between native and CEL rules during testing by setting a global unexported variable (there's a big comment on it).
| // When processWrapperRules unwraps a wrapper type (e.g., Int32Value), | ||
| // it calls buildValue with the inner scalar field descriptor but | ||
| // attaches the resulting evaluators to the outer value. At runtime, | ||
| // the evaluator receives the wrapper message, not the inner scalar, | ||
| // so native evaluators that call val.Int() would panic. The outer | ||
| // value's Descriptor still points to the wrapper message field, so | ||
| // we use its Kind to detect this case and fall back to CEL. |
There was a problem hiding this comment.
can probably get around this by composing the evaluator in an unwrapper that handles this.
There was a problem hiding this comment.
google wrapper well known types are now supported in native code. zero allocations in native code, over 58% faster.
|
|
||
| // tryBuildNativeBoolRules attempts to build a native Go evaluator for | ||
| // bool rules. Returns nil if the rules can't be handled natively. | ||
| func tryBuildNativeBoolRules(base base, rules *validate.BoolRules) evaluator { |
There was a problem hiding this comment.
I'd move this to the top (same for all native_* files) as it's the entrypoint for building up the evaluators.
| errNotUTF8 = errors.New("must be valid UTF-8 to apply regexp") | ||
| ) | ||
|
|
||
| func (n nativeBytesEval) Evaluate(_ protoreflect.Message, val protoreflect.Value, _ *validationConfig) error { |
There was a problem hiding this comment.
Because we return early everywhere, this violates the existing default "fail open" behavior where we capture all violations, not just the first one. Either this needs to respect the failFast flag in validationConfig, or should be split into individual evaluators for each and let the rest of the machinery handle this correctly.
There was a problem hiding this comment.
Do we have conformance tests (or other tests) to validate this behavior? I'll fix the behavior to match the spec, presumably in all of the native Evaluate implementations.
There was a problem hiding this comment.
I've refactored the code so that we return multiple violations if there are multiple violations and failFast is not set.
There was a problem hiding this comment.
Do we have conformance tests (or other tests) to validate this behavior? I'll fix the behavior to match the spec, presumably in all of the native Evaluate implementations.
There may be a few conformance tests that expect more than one violation, but I could be wrong. We should add it, if not. It's toggle-able when setting up a validator in pv-go for folks who want it exit early to shave off some execution time (and memory).
There was a problem hiding this comment.
The code was passing all of the conformance tests when it was swallowing multiple violations, so if there are conformance tests that expect multiple violations, they are not working correctly. I'll investigate.
There was a problem hiding this comment.
There are no conformance tests covering multiple standard rules on a field (or fail-fast). Should be trivial to add some, but for confidence that all rules are conformant, we’d need the generative approach for conformance tests we’ve talked about, Jon.
| } | ||
|
|
||
| func (n nativeMapEval) Evaluate(_ protoreflect.Message, val protoreflect.Value, _ *validationConfig) error { | ||
| size := uint64(val.Map().Len()) //nolint:gosec |
There was a problem hiding this comment.
We probably need to turn it on in the golangcilint config, but we should include the reason with the nolint directive:
| size := uint64(val.Map().Len()) //nolint:gosec | |
| size := uint64(val.Map().Len()) //nolint:gosec // maps can never be large enough to overflow u64 |
There was a problem hiding this comment.
comment explains that int can't be negative or out of uint64 range.
| minPairs *uint64 | ||
| maxPairs *uint64 |
There was a problem hiding this comment.
instead of pointers forcing an allocation, initialize this to the extremes
There was a problem hiding this comment.
changed to value types
| } | ||
|
|
||
| func (n nativeMapEval) Tautology() bool { | ||
| return false |
There was a problem hiding this comment.
This is tautalogical if min_pairs/max_pairs aren't set (or are at the extremes). Having this report the correct value allows for tree-shaking of noop evaluators (even if by construction it's not likely to show up).
There was a problem hiding this comment.
should be addressed now
| ### Experimental native support release | ||
| This release provides experimental native support for standard validation rule processing. It is disabled by default | ||
| and enabled by setting the environment variable `PV_NATIVE_RULES` to `true` at runtime. | ||
|
|
||
| Performance improvements on the included benchmarks: |
There was a problem hiding this comment.
See above. I think if we're shipping this and it satisfies the conformance tests, we shouldn't gate it behind an environment variable flag. If we really don't want this to apply by default, we should use go build flags, so it doesn't need to bleed into folks' deployments. That said, I think this is a low risk, high impact improvement that doesn't require flags at all.
There was a problem hiding this comment.
We just need to retain the capability to run the conformance suite on both paths. protovalidate-go with cel-go gives us the best correctness signal for upstream changes to validate.proto.
…a build tag instead of an environment variable to control native vs CEL rules. Make native rules the default.
…e pointers. Add test coverage for repeated bytes and strings.
rodaine
left a comment
There was a problem hiding this comment.
The other alternative besides a build flag is to have a ValidatorOption to disable the native rules. That'll make your testing a bit cleaner as well.
There was a problem hiding this comment.
For all the native tests, I'd gate the files behind the build flag instead of fighting parallel tests. Since all these tests are in the same package, unless ran individually all tests will potentially see native enabled which isn't what we want. Then for the actual flag, you'll have two files, setting useNativeRules as a constant instead of a variable (and having to fight the linter everywhere).
There was a problem hiding this comment.
I'll look into the ValidatorOption.
| func init() { | ||
| useNativeRules = false | ||
| } |
There was a problem hiding this comment.
This would become
const useNativeRules = falseAnd you'd have a separate file with
//go:build !cel_rules
package protovalidate
const useNativeRules = trueThere was a problem hiding this comment.
This was intentionally not a const so that I could switch it during tests.
|
|
||
| // tryBuildNativeBoolRules attempts to build a native Go evaluator for | ||
| // bool rules. Returns nil if the rules can't be handled natively. | ||
| func tryBuildNativeBoolRules(base base, rules *validate.BoolRules) evaluator { |
There was a problem hiding this comment.
Currently, if we add a new rule to the standard set, they're immediately consumable by pv-go. If I added a (buf.validate.field).bool.foo = "bar" rule, will that rule get built or silently ignored?
There was a problem hiding this comment.
It will get silently ignored, but that's where a test would come in (presumably we'd add a conformance test to validate the new rule, and it would fail for native).
I have an idea on how to fail over to CEL if an unknown rule is used, but I haven't tried it out yet. However, adding the automatic failover would mean that we wouldn't have a great signal that we need to update the native rules.
There was a problem hiding this comment.
Yes, but right now it "just works" because there is no special treatment. We need to have these native rules apply at the same granularity (maybe based off the constraint ID) so that the replacements are more surgical. This will avoid a lot of extra boilerplate to.
There was a problem hiding this comment.
I've implemented fallthrough for rules that are implemented in CEL but not in native code. In builder.processStandardRules, a copy is made of the passed-in rules. When the rules copy is passed to a native implementation, any known validation has its value cleared. The rules copy is then used to build the CEL validators. This allows any validation rules that are unknown to the native code to be processed by the CEL engine. I've tested this by commenting out a single native validation. The CEL rule is substituted automatically.
There was a problem hiding this comment.
There's a performance impact at compile time, but it's still much faster than CEL-only rules.
There was a problem hiding this comment.
Since we're running the tests independently, we don't need to do this sort of test, correct? Better to keep the internals a black box at this level imo.
There was a problem hiding this comment.
I've found it a pretty useful way to develop and validate the rules.
There was a problem hiding this comment.
Then I think using a ValidatorOption is the better choice. Will let you actually run things in parallel.
There was a problem hiding this comment.
The ValidatorOption is now implemented.
…r or not native rules are used. Update tests to use it and allow them to run in parallel.
…tions for error generation (still room for more improvement). add fall-through support for any rules that aren't covered in native.
This PR provides native implementations for the following protovalidate rules:
(buf.validate.field).{numeric_type}.const
(buf.validate.field).{numeric_type}.lt
(buf.validate.field).{numeric_type}.lte
(buf.validate.field).{numeric_type}.gt
(buf.validate.field).{numeric_type}.gte
(buf.validate.field).{numeric_type}.in
(buf.validate.field).{numeric_type}.not_in
(buf.validate.field).{numeric_type}.finite
(buf.validate.field).bool.const
(buf.validate.field).string.const
(buf.validate.field).string.len
(buf.validate.field).string.min_len
(buf.validate.field).string.max_len
(buf.validate.field).string.len_bytes
(buf.validate.field).string.min_bytes
(buf.validate.field).string.max_bytes
(buf.validate.field).string.pattern
(buf.validate.field).string.prefix
(buf.validate.field).string.suffix
(buf.validate.field).string.contains
(buf.validate.field).string.not_contains
(buf.validate.field).string.in
(buf.validate.field).string.not_in
(buf.validate.field).string.email
(buf.validate.field).string.hostname
(buf.validate.field).string.ip
(buf.validate.field).string.ipv4
(buf.validate.field).string.ipv6
(buf.validate.field).string.uri
(buf.validate.field).string.uri_ref
(buf.validate.field).string.address
(buf.validate.field).string.uuid
(buf.validate.field).string.tuuid
(buf.validate.field).string.ip_with_prefixlen
(buf.validate.field).string.ipv4_with_prefixlen
(buf.validate.field).string.ipv6_with_prefixlen
(buf.validate.field).string.ip_prefix
(buf.validate.field).string.ipv4_prefix
(buf.validate.field).string.ipv6_prefix
(buf.validate.field).string.host_and_port
(buf.validate.field).string.ulid
(buf.validate.field).string.well_known_regex
(buf.validate.field).string.strict
(buf.validate.field).bytes.const
(buf.validate.field).bytes.len
(buf.validate.field).bytes.min_len
(buf.validate.field).bytes.max_len
(buf.validate.field).bytes.pattern
(buf.validate.field).bytes.prefix
(buf.validate.field).bytes.suffix
(buf.validate.field).bytes.contains
(buf.validate.field).bytes.in
(buf.validate.field).bytes.not_in
(buf.validate.field).bytes.ip
(buf.validate.field).bytes.ipv4
(buf.validate.field).bytes.ipv6
(buf.validate.field).bytes.uuid
(buf.validate.field).enum.const
(buf.validate.field).enum.defined_only
(buf.validate.field).enum.in
(buf.validate.field).enum.not_in
(buf.validate.field).repeated.min_items
(buf.validate.field).repeated.max_items
(buf.validate.field).repeated.unique
(buf.validate.field).repeated.items
(buf.validate.field).map.min_pairs
(buf.validate.field).map.max_pairs
(buf.validate.field).map.keys
(buf.validate.field).map.values
By default, the native rules are disabled. To use them, run with the environment variable
PV_NATIVE_RULESset totruePerformance improvement from using native rules: