Enable 14 PPL datetime scalar functions on analytics-engine route#21582
Enable 14 PPL datetime scalar functions on analytics-engine route#21582mengweieric wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit ccf8920)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to ccf8920 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 7caf76c
Suggestions up to commit 8096d30
|
|
❌ Gradle check result for 8096d30: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
8096d30 to
7caf76c
Compare
|
Persistent review updated to latest commit 7caf76c |
…oute Extends the PPL datetime surface of analytics-backend-datafusion with a Wave A bundle of 14 functions: strftime, dayofweek / day_of_week, second / second_of_minute, date, datetime, sysdate, extract, from_unixtime (1-arg), maketime, makedate, date_format, time_format, str_to_date. Each function is routed through DataFusionAnalyticsBackendPlugin's scalar capabilities and wired end-to-end to Substrait so that force-routed queries on /_plugins/_ppl execute on DataFusion without any Calcite fallback. Routing strategy per function: - Calcite builtins reused (date, datetime, dayofweek, second, sysdate) via name-mapping adapters that preserve PPL's declared return type. - Rust UDFs added for the MySQL-flavored behaviors that have no 1:1 DataFusion builtin (strftime, extract, from_unixtime, maketime, makedate, date_format, time_format, str_to_date) with a shared mysql_format token table underpinning the *_format family. End-to-end verification: all 14 functions pass against a force-routed runTask cluster on /_plugins/_ppl, confirmed via explain (viableBackends=[datafusion]) and ShardFragmentStageExecution traces. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
7caf76c to
ccf8920
Compare
|
Persistent review updated to latest commit ccf8920 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21582 +/- ##
============================================
- Coverage 73.50% 73.44% -0.06%
+ Complexity 74644 74618 -26
============================================
Files 5980 5980
Lines 338777 338777
Branches 48848 48848
============================================
- Hits 249011 248818 -193
- Misses 69946 70158 +212
+ Partials 19820 19801 -19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Extends the PPL datetime function surface of
analytics-backend-datafusionwith a Wave A bundle of 14 scalar functions, routing them through the force-routable/_plugins/_pplendpoint so they execute on DataFusion without a Calcite fallback.Functions landed in this PR:
date(expr),datetime(expr),makedate(year, doy),maketime(h, m, s),from_unixtime(sec)dayofweek/day_of_week,second/second_of_minute,extract(unit FROM …)sysdate()strftime,date_format,time_format,str_to_dateRouting strategy
Each function picks the narrowest reliable implementation path:
date,datetime,dayofweek,second,sysdate). These go through name-mapping adapters that preserve PPL's declared return type viaAbstractNameMappingAdapter.strftime,extract(22-unit MySQL set incl. digit-concatenation composites),from_unixtime,maketime,makedate,date_format,time_format,str_to_date. The*_format/str_to_datefamily shares a singlemysql_formattoken table to keep rendering/parsing in lock-step.makedatedoy overflow / year remapping,maketimerounding rules,strftime'sabs(v) >= 1e11ms auto-detect,date_formatordinal-day suffixes,time_format's date-token collapse rules.Scope boundaries
timestamp(expr)is excluded from Wave A and stays on legacy Calcite: the function's return type collides with an existing enum entry and is better handled in a follow-up.TIME-operand andTime64return paths (e.g.hour(time(...)), end-to-endmaketimesurface assertions) remain blocked by a knownsubstrait-java 0.89.1ToTypeStringgap onParameterizedType.PrecisionTime. Rust-level unit tests cover the semantics; integration-layer coverage ships with the upstream substrait-java fix.Files changed
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/): newStrftimeFunctionAdapter,SecondAdapter,DayOfWeekAdapter,RustUdfDateTimeAdapters; extensions toDateTimeAdapters,DataFusionAnalyticsBackendPlugin(capability wiring),DataFusionFragmentConvertor(additional scalar signatures).opensearch_scalar_functions.yamlextended with the new signatures.sandbox/plugins/analytics-backend-datafusion/rust/src/udf/):strftime,extract,from_unixtime,maketime,makedate,date_format,time_format,str_to_date,mysql_format(shared token table).ScalarFunctionenum extended with Wave A entries.StrftimeFunctionAdapterTests(adapter-level), new qa-moduleDateTimeScalarFunctionsIT, new cases in internal-clusterScalarDateTimeFunctionIT, plus ~70 Rust unit tests across the new UDFs.Verification
End-to-end verified against a force-routed
runTaskcluster on/_plugins/_ppl(-Dplugins.query.analytics.force_route=true). Routing confirmed viaexplain(viableBackends=[datafusion]) and cluster logs (ShardFragmentStageExecution state=CREATED). All 14 functions return correct values on parquet-formatted indices.Pre-PR gates (all green on this branch):
./gradlew -Dsandbox.enabled=true :sandbox:plugins:analytics-backend-datafusion:spotlessApplycargo test --libon each new UDF module: 70 passed / 0 failed./gradlew -Dsandbox.enabled=true :sandbox:plugins:analytics-backend-datafusion:check./gradlew check -p sandbox -Dsandbox.enabled=true(includesanalytics-engine-restqaintegTest/integTestMemtable/integTestStreaming)Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.