Close gaps from top/rare analytics-engine wiring#5433
Conversation
`RestUnifiedQueryAction.applyClusterOverrides` previously only forwarded `PPL_REX_MAX_MATCH_LIMIT` into the per-request `UnifiedQueryContext`. As a result, cluster-side updates to `plugins.ppl.syntax.legacy.preferred` were ignored on the analytics-engine route: `PPLQueryParser` -> `AstBuilder` -> `ArgumentFactory` read the legacy-preferred flag from the unified context's settings map, which never received the override. This caused queries like `top age` / `rare state` with `usenull` defaulting off to behave as if `usenull=true` on the analytics route. Refactor the override builder into a small helper and forward both `PPL_REX_MAX_MATCH_LIMIT` and `PPL_SYNTAX_LEGACY_PREFERRED`. Future keys can be added with a one-liner. Fixes `CalciteTopCommandIT.testTopCommandLegacyFalse` and `CalciteRareCommandIT.testRareCommandLegacyFalse` against the analytics route (`-Dtests.analytics.force_routing=true`). Signed-off-by: Kai Huang <huangkaics@gmail.com>
… order `CalciteRelNodeVisitor.visitRareTopN` lowers `rare`/`top` to a `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY count [DESC])` window. With only the count column in the ORDER BY clause, ties at the same count resolved via the upstream operator's insertion order, which differed between backends (in-process Calcite vs. analytics-engine vs. Lucene pushdown). On the analytics-engine route, `testRareWithGroup` failed because ROW_NUMBER picked NV at count=8 while the test expected AR. Append the rare/top field columns as secondary ASC keys so ties resolve alphabetically and deterministically across backends. This matches the behavior of the existing OpenSearch terms-aggregation pushdown, which tie-breaks on `_key:asc`. Update `RareTopPushdownRule` to accept the new shape: 1 or 2 order keys, where the (optional) second key must be the rare/top target field in ASC direction. The pushdown's wire payload is unchanged. Update the matching unit-test expectations in `CalcitePPLRareTopNTest` (11 RelNode/result/SparkSQL strings) and 5 explain YAML fixtures. Fixes `CalciteRareCommandIT.testRareWithGroup` against the analytics route (and removes the same class of tie-break flakiness across other rare/top tests). Signed-off-by: Kai Huang <huangkaics@gmail.com>
PR Reviewer Guide 🔍(Review updated until commit 59c9f7d)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 59c9f7d
Previous suggestionsSuggestions up to commit f6b8b98
|
The stable tie-break added in the previous commit appends the rare/top field columns as secondary ASC keys to the `ROW_NUMBER` `ORDER BY`. ASC ordering uses NULLS LAST by default, so the existing `usenull=true email` examples in `docs/user/ppl/cmd/rare.md` and `docs/user/ppl/cmd/top.md` now emit `null` last instead of first. Update the doctest expected output blocks accordingly. No behavior change for the non-null rows. Signed-off-by: Kai Huang <huangkaics@gmail.com>
|
Persistent review updated to latest commit 59c9f7d |
…n fixtures
Three follow-ups to the rare/top tie-break commit, surfaced by CI:
1. **Skip non-comparable tie-break columns.** `top address` on a nested-array
field reached the in-process `EnumerableWindow` with `ORDER BY count DESC,
address` and crashed at runtime — `class java.util.ArrayList cannot be cast
to class java.lang.Comparable`. Filter array / multiset / map / struct (row)
columns out of the tie-break list before appending. If all field columns are
non-comparable, no tie-break is added (falling back to the original
count-only ORDER BY).
2. **Sync the parallel `calcite_no_pushdown/` explain fixtures.**
`PPLIntegTestCase.loadExpectedPlan` switches to
`expectedOutput/calcite_no_pushdown/` when pushdown is disabled, which
`CalciteNoPushdownIT` exercises. The previous commit only updated the
`expectedOutput/calcite/` copies, so `CalciteNoPushdownIT > CalciteExplainIT`
read stale (no-tie-break) YAML. Update the four `explain_{rare,top}_usenull_*`
YAMLs under `calcite_no_pushdown/` to include the `, $N` tie-break key in
both the logical plan and the `EnumerableWindow` collation.
3. **Collapse the pushdown-disabled branch in `RareCommandIT.testRareWithGroup`.**
With the stable tie-break, both pushdown-enabled and pushdown-disabled paths
now resolve the count=8 tie deterministically to `AR` (alphabetical). Drop
the `isPushdownDisabled() ? rows("F", "NV", 8) : rows("F", "AR", 8)`
conditional in favor of the single `AR` expectation.
Signed-off-by: Kai Huang <huangkaics@gmail.com>
Signed-off-by: Kai Huang <huangkaics@gmail.com>
Description
Two SQL-plugin–side fixes that close the three out-of-scope failures called out in opensearch-project/OpenSearch#21593 (window-function wiring for
top/rareon the analytics-engine route). With that PR plus these fixes,CalciteTopCommandITandCalciteRareCommandITgo from 8/11 → 11/11 against the force-routed analytics-engine path.Fix 1 — Forward
PPL_SYNTAX_LEGACY_PREFERREDthrough the unified contextRestUnifiedQueryAction.applyClusterOverridespreviously only forwardedPPL_REX_MAX_MATCH_LIMITinto the per-requestUnifiedQueryContext. As a result, cluster-side updates toplugins.ppl.syntax.legacy.preferredwere ignored on the analytics-engine route —PPLQueryParser→AstBuilder→ArgumentFactoryreads the legacy-preferred flag from the unified context's settings map, which never received the override. Queries liketop agewith the cluster setting flipped tofalsebehaved as ifusenull=true(legacy default) on the analytics route only.Refactored the override builder into a small
forwardClusterSettinghelper and forward bothPPL_REX_MAX_MATCH_LIMITandPPL_SYNTAX_LEGACY_PREFERRED. Future keys are now one-liners.Unblocks
testTopCommandLegacyFalseandtestRareCommandLegacyFalse.Fix 2 — Stable tie-break for
RareTopNROW_NUMBERCalciteRelNodeVisitor.visitRareTopNlowersrare/toptoROW_NUMBER() OVER (PARTITION BY ... ORDER BY count [DESC]). With only the count column in the ORDER BY clause, ties at the same count resolved via the upstream operator's insertion order, which differed between backends (in-process Calcite vs. analytics-engine vs. Lucene pushdown).testRareWithGroupfailed on the analytics route because ROW_NUMBER pickedNVat count=8 while the test expectedAR.Appended the rare/top field columns as secondary ASC keys so ties resolve alphabetically and deterministically across backends. This matches the existing OpenSearch terms-aggregation pushdown, which already tie-breaks on
_key:asc.RareTopPushdownRulenow accepts the new shape: 1 or 2 order keys, where the optional second key must be the rare/top target field in ASC direction. The pushdown's wire payload is unchanged — same OS terms-agg request as before.Unblocks
testRareWithGroup.Pass-rate impact
Measured on the SQL plugin's
:integTestRemotetask against an externally-managed cluster running opensearch-project/OpenSearch#21593 (feature/toprare-analytics-verify@114e8bf8e3a) with-Dtests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true:CalciteTopCommandITCalciteRareCommandITRegression sweep (all green)
integTestRemotewithoutforce_routing): Top + Rare → 11/11 (no regression)RareCommandIT,TopCommandIT): 5/5:ppl:test :core:test :opensearch:test :api:test: cleanCalciteExplainIT(consumes the 5 updated explain YAML fixtures): cleanFiles
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java— Fix 1core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java— Fix 2 (tie-break in lowering)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java— Fix 2 (accept the new 2-key shape)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java— updated 11 expected RelNode / result / SparkSQL strings to reflect deterministic tie-break orderinteg-test/src/test/resources/expectedOutput/calcite/explain_{rare,top}_usenull_{true,false}.yaml,explain_nested_agg_top_push.yaml— updated 5 explain fixturesIssues resolved
N/A — closes the out-of-scope follow-ups documented in opensearch-project/OpenSearch#21593.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.