Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
c2c8ba0
feat: add experimental native columnar to row conversion
andygrove Jan 19, 2026
49a5b20
cargo fmt
andygrove Jan 19, 2026
e558073
cargo clippy
andygrove Jan 19, 2026
a44066f
docs
andygrove Jan 19, 2026
fd58cba
update benchmark [skip ci]
andygrove Jan 19, 2026
bac9164
fix: use correct element sizes in native columnar to row for array/map
andygrove Jan 19, 2026
3ca5553
test: add fuzz test with nested types to native C2R suite
andygrove Jan 19, 2026
7f2e64d
test: add deeply nested type tests to native C2R suite
andygrove Jan 19, 2026
7afc4ba
test: add fuzz test with generateNestedSchema for native C2R
andygrove Jan 20, 2026
adc13a6
format
andygrove Jan 20, 2026
56df742
fix: handle LargeList and improve error handling in native C2R
andygrove Jan 20, 2026
461c625
fix
andygrove Jan 20, 2026
8b8741c
fix: add Dictionary-encoded array support to native C2R
andygrove Jan 20, 2026
b8ed2e7
format
andygrove Jan 20, 2026
330dbb2
clippy [skip ci]
andygrove Jan 20, 2026
8231a75
test: add benchmark comparing JVM and native columnar to row conversion
andygrove Jan 20, 2026
f2cc61c
perf: optimize native C2R by eliminating Vec allocations for strings
andygrove Jan 20, 2026
3ebcaca
perf: add fixed-width fast path for native C2R
andygrove Jan 20, 2026
ed72c29
test: add fixed-width-only benchmark and refactor C2R benchmark
andygrove Jan 20, 2026
17d83d5
perf: optimize complex types in native C2R by eliminating intermediat…
andygrove Jan 20, 2026
5f26a81
perf: add bulk copy optimization for primitive arrays in native C2R
andygrove Jan 20, 2026
e5b2c61
perf: add pre-downcast optimization for native C2R general path
andygrove Jan 20, 2026
7743138
fix: correct array element bulk copy for Date32, Timestamp, Boolean
andygrove Jan 20, 2026
9c66ef6
perf: Velox-style optimization for array/map C2R (40-52% faster)
andygrove Jan 20, 2026
64c5212
perf: inline type dispatch for struct fields in native C2R
andygrove Jan 20, 2026
04c49fb
perf: pre-downcast struct fields for native C2R
andygrove Jan 20, 2026
47d4c50
perf: optimize general path for mixed fixed/variable-length columns
andygrove Jan 20, 2026
081b3ed
revert
andygrove Jan 20, 2026
f696595
upmerge
andygrove Jan 20, 2026
92e1abb
revert doc format change
andygrove Jan 20, 2026
e735434
fix: address clippy warnings and remove dead code in native C2R
andygrove Jan 20, 2026
ab074bd
Remove #[inline] hint from bulk_copy_range
andygrove Jan 20, 2026
377214a
fix
andygrove Jan 20, 2026
1c432ea
upmerge
andygrove Jan 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
CLAUDE.md
target
.idea
*.iml
Expand Down
11 changes: 11 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,17 @@ object CometConf extends ShimCometConf {
val COMET_EXEC_LOCAL_TABLE_SCAN_ENABLED: ConfigEntry[Boolean] =
createExecEnabledConfig("localTableScan", defaultValue = false)

val COMET_NATIVE_COLUMNAR_TO_ROW_ENABLED: ConfigEntry[Boolean] =
conf(s"$COMET_EXEC_CONFIG_PREFIX.columnarToRow.native.enabled")
.category(CATEGORY_EXEC)
.doc(
"Whether to enable native columnar to row conversion. When enabled, Comet will use " +
"native Rust code to convert Arrow columnar data to Spark UnsafeRow format instead " +
"of the JVM implementation. This can improve performance for queries that need to " +
"convert between columnar and row formats. This is an experimental feature.")
.booleanConf
.createWithDefault(false)

val COMET_EXEC_SORT_MERGE_JOIN_WITH_JOIN_FILTER_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.exec.sortMergeJoinWithJoinFilter.enabled")
.category(CATEGORY_ENABLE_EXEC)
Expand Down
20 changes: 20 additions & 0 deletions common/src/main/scala/org/apache/comet/vector/NativeUtil.scala
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,26 @@ class NativeUtil {
(arrays, schemas)
}

/**
* Exports a ColumnarBatch to Arrow FFI and returns the memory addresses.
*
* This is a convenience method that allocates Arrow structs, exports the batch, and returns
* just the memory addresses (without exposing the Arrow types).
*
* @param batch
* the columnar batch to export
* @return
* a tuple of (array addresses, schema addresses, number of rows)
*/
def exportBatchToAddresses(batch: ColumnarBatch): (Array[Long], Array[Long], Int) = {
val numCols = batch.numCols()
val (arrays, schemas) = allocateArrowStructs(numCols)
val arrayAddrs = arrays.map(_.memoryAddress())
val schemaAddrs = schemas.map(_.memoryAddress())
val numRows = exportBatch(arrayAddrs, schemaAddrs, batch)
(arrayAddrs, schemaAddrs, numRows)
}

/**
* Exports a Comet `ColumnarBatch` into a list of memory addresses that can be consumed by the
* native execution.
Expand Down
Loading
Loading