RFC-0005 Phase 2, annotated string functions and added tests.#1
Open
ScrapCodes wants to merge 3 commits intoRFC_5_UDF_STATS_PHASE_1from
Open
RFC-0005 Phase 2, annotated string functions and added tests.#1ScrapCodes wants to merge 3 commits intoRFC_5_UDF_STATS_PHASE_1from
ScrapCodes wants to merge 3 commits intoRFC_5_UDF_STATS_PHASE_1from
Conversation
7f5dc5c to
55e620a
Compare
a9f04a8 to
99302d3
Compare
1. Support for annotating functions with both constant stats and propagating source stats.
2. Added tests for the same.
3. Added Scalar stats calculation based on annotation and tests for the same.
Not added SQLInvokedScalarFunctions.
Not annotated builtin functions, as that is covered in next implementation phase.
Not added C++ changes as this phase only covers Java side of changes.
Added documentation for the new properties and ...
1. Previously, if any of the source stats were missing, we would still compute the max/min/sum of argument stats etc..
now we propagate NaNs if any one of the arguments' stats are missing.
2. For distinct values count, upper bounding it to row count is as good as unknown. Therefore, the approach here is, when distinctValuesCount is greater than row count and is provided via annotation we set it to unknown.
A function developer has full control here, for example developer can choose to upper bound or not by selecting the appropriate StatsPropagationBehavior value.
3. For average row size,
a) If average row size is provided via ScalarFunctionConstantStats annotation, then we allow even if the size is greater than functions return type width.
b) If average row size is provided via one of the StatsPropagationBehavior values, then we upper bound it to functions return type width - if available.
If both (a) and (b) is unknown, then we default it to functions return type width if available.
This way the function developer has greater control.
Added new behaviour SUM_ARGUMENTS_UPPER_BOUND_ROW_COUNT which would upper bound the values to row count, so that summing distinct values count not exceed row counts.
99302d3 to
9d77026
Compare
608d2ab to
8962804
Compare
4eab57e to
4dc9602
Compare
…ions` class, with `ScalarFunctionConstantStats` and `ScalarPropagateSourceStats` . 2. Added appropriate tests to check if the stats propagation works as expected.
4dc9602 to
c295d03
Compare
e248c01 to
40b6f47
Compare
9551c16 to
7693879
Compare
c334fdb to
07d2d0c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
StringFunctionsclass, withScalarFunctionConstantStatsandScalarPropagateSourceStats.Motivation and Context
https://github.com/prestodb/rfcs/blob/main/RFC-0005-functions-stats.md
Impact
None unless the user chooses to enable the feature via setting the session/feature flag.
A new session flag, scalar_function_stats_propagation_enabled and a new feature config will be introduced i.e. optimizer.scalar-function-stats-propagation-enabled, by setting this session flag or feature flag, this feature can be turned on or off.
When the feature is enabled, since string functions are annotated the effect of stats propagation can be measured or seen in the form of plan changes.
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.