Skip to content

chore: Rebase on upstream#9

Closed
maksym-iv-ef wants to merge 1247 commits into
elastiflow:mainfrom
apache:main
Closed

chore: Rebase on upstream#9
maksym-iv-ef wants to merge 1247 commits into
elastiflow:mainfrom
apache:main

Conversation

@maksym-iv-ef
Copy link
Copy Markdown

Which issue does this PR close?

Update main to upstream
rebase PR merge should be done for this PR in particular

xudong963 and others added 30 commits September 16, 2025 08:30
* Intermediate work on setting up sql feature flags throughout repo

* More intermediate work, but does not yet compile

* Working through more points of friction to remove sql as feature

* Switch optimizer to compare full logical plan instead of hash of the plan in an effort to remove the hash functions from the final wasm blob. Probably need to revert or put under a feature flag.

* Corrections after rebase

* Resolve errors and clippy warnings after rebase

* Remove unused imports when not using sql feature

* Working through more feature gating

* Working through more feature gating

* wasmtest required sql

* working on CI

* Update docs

* Remove some duplicate code by doing a bit of sql-like parsing of idedntifiers

* Conditionally import sql features

* Set default features to false so we do not always pull in sql

* Add check for just sql feature

* Update readme for features

* Taplo format

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.11.1 to 2.11.3.
- [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md)
- [Commits](indexmap-rs/indexmap@2.11.1...2.11.3)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-version: 2.11.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.143 to 1.0.145.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](serde-rs/json@v1.0.143...v1.0.145)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-version: 1.0.145
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Enable `inefficient_to_string` Clippy lint and update the code. The code
changes were done by Clippy `--fix`.

The lint addresses a Rust's type system or stdlib limitation, which
results in `(&&String).to_string()` being a lot slower than
`(&String).to_string()` and `(*(&&String)).to_string()`. The difference
is 40% in my measurements on 10-character long strings. The difference
comes from the fact that the faster code paths are optimized for
`String` and the `&&` double reference somehow prevents the compiler
from taking such specialized code path.
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.6 to 2.61.8.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](taiki-e/install-action@cc60de1...2fdc5fd)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.61.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Always run CI checks

* Add `pre-job`

* For `config-docs-check`, we don't need pre-job

* Comments

* keep `branches-ignore`
* docs: Update documentation on Epics and Sponsoring Maintainers

* Improve wording about finding sponsoring maintainers

* rename to supervising maintainer
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
* Extend case simplify expr

* Add tests

* cargo fmt

* Remove copying vector based on PR feedback

* Remove unnecessary if conditional (pr feedback)
…to prevent metadata loss (#17524)

* [ISSUE 17422] Fix field metadata preservation when outer reference
column is involved

* [ISSUE 17422] Add tests

* Fixed sql logic tet

* Add more tests

* Assert on the actual size of expr

* Make the API for creating outer ref column more friendly, mark out_ref_col as deprecated

* Don't deprecate out_ref_col since there are lots of use cases where it could be useful
* Use Display formatting for DataTypes where I could find them

* fix

* More places

* Less Debug

* Cargo fmt

* More cleanup

* Plural types as Display

* Fixes

* Update some more tests and error messages

* Update test snapshot

* last (?) fixes

* update another slt

* Update instructions on how to run the tests

* Ignore pending snapshot files in .gitignore

* Running all the tests is so slow

* just a trailing space

* Update another test

* Fix markdown formatting

* Improve Display for NativeType

* Update code related to error reporting of NativeType

* Revert some formatting

* fixelyfix

* Another snapshot update
* Move GSOC content to its own section

* Update to 20205
* feat: Add `OR REPLACE` to creating external tables

* regen

* fmt

* make more explicit + add tests

* clipy fix

---------

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
* chore: mv `DistinctSumAccumulator` to common

* feat: add avg distinct support for float64 type

* chore: fmt

* refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module

* feat: add avg distinct support for float64 type

* feat: add avg distinct support for decimal

* feat: more test for avg distinct in rust api

* Remove DataFrame API tests for avg(distinct)

* Remove proto test

* Fix merge errors

* Refactoring

* Minor cleanup

* Decimal slt tests for avg(distinct)

* Fix state_fields for decimal distinct avg

---------

Co-authored-by: YuNing Chen <admin@ynchen.me>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.61.9.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](taiki-e/install-action@2fdc5fd...8ea3248)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.61.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.8.0 to 2.8.1.
- [Release notes](https://github.com/swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](Swatinem/rust-cache@98c8021...f13886b)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-version: 2.8.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…17029)

* use GreedyMemoryPool for sanity check

* validate whether batch read from spill exceeds max_record_batch_mem

* replace err with warn log
* fix(SubqueryAlias): use maybe_project_redundant_column

Fixes #17405

* chore: format

* ci: retry

* chore(SubqueryAlias): restructore duplicate detection and add tests

* docs: add examples and context to the reproducer
* optimizer: Convert to Hash Join for join predicates like 'a IS NOT DISTINCT FROM b'

* drop tables in slt

* fix rust doc

* Update datafusion/optimizer/src/extract_equijoin_predicate.rs

Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com>

* Update datafusion/optimizer/src/extract_equijoin_predicate.rs

* Update datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt

Co-authored-by: Nga Tran <nga-tran@live.com>

* review: more tests and better error message

* review: improve doc

---------

Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com>
Co-authored-by: Nga Tran <nga-tran@live.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Update to arrow/parquet 56.1.0

* Adjust for new parquet sizes, update for deprecated API

* Thread through max_predicate_cache_size, add test
…xpression (#17525)

* [ISSUE 17425] Initial attempt to fix this problem

* Add tests for the fix

* Require that the metadata of values in VALUES clause must be identical

* fix merge error

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.223 to 1.0.225.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.223...v1.0.225)

---
updated-dependencies:
- dependency-name: serde
  dependency-version: 1.0.225
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
* chore: update dynamic filter formatting to indicate expr is placeholder

* update tests

* update tests
Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.9 to 2.61.10.
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](taiki-e/install-action@8ea3248...0aa4f22)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.61.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
dependabot Bot and others added 10 commits October 14, 2025 19:42
Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.37.1 to 0.37.2.
- [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md)
- [Commits](GuillaumeGomez/sysinfo@v0.37.1...v0.37.2)

---
updated-dependencies:
- dependency-name: sysinfo
  dependency-version: 0.37.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat(spark): implement Spark elt function

* feat(spark): implement Spark elt function license

* feat(spark): implement Spark elt function license complettly

* test slt

* doc

* wip

* change error

* fmt

* some commts

* some changes

* coerce basic <- udf

* coerce basic drop 32

* change n,k,j names

* first value int64 or casteable

* change revision

* change revision test

* change revision test fmt
* FileScanConfig: preserve schema metadata across serde boundary

* add roundtrip_physical_plan test with filescanconfig schema metadata

* lint
Filters are safe to be pushed down, so we can override the default behavior
here.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
* Adds summary output to CLI instrumented object stores
 - Adds a `RequestSummary` type for the instrumented object store to
   display summary statistics about instrumented requests
 - Adds a generic Stats type to track the statistics for the summary
 - Adds tests for the new code
 - Adds a basic summary output to the user-facing display when profiling
   is enabled
 - Adds docs for new and newly exported public items

* - Updates integration test and validation snapshot
* Impl spark bit not function

* Impl spark bit not function

* Fix format

* Fix format

* Fix Clippy warnings

* Rename func

* Add .slt tests

* Fix fmt

---------

Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
* chore: revert tests

* chore: revert tests
…tion (#17973)

* #17972 Restore case expr/expr optimisation while ensuring lazy evaluation

* Avoid calling `PhysicalExpr::evaluate` from `PhysicalExpr::evaluate_selection` for empty selections.

* Make `PhysicalExpr::evaluate_selection` correctly handle empty input sets and all false filters

* Reoragnize code to avoid scatter codepath when using `evaluate` fast path.

* Clarify comments in case

* Move null handling after true count check.

* Tweaking comments

* Add unit tests to help define the boundary case behaviour of evaluate_selection

* Code polishing
- Add extra comments
- Use match for the scatter paragraph
- Validate that the size of selection and batch match

* Fix clippy errors

* Add additional case SLTs

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…n array (#18048)

* chore: Use an enum to express the different kinds of nullability in an array

Follow-up of #17726 (review)

* Use the Nulls enum also in .../multi_group_by/primitive.rs

* Use the Nulls enum in .../multi_group_by/bytes[_view].rs as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.