Skip to content

Add cuDF parquet reader support in PlanBuilder.cpp and ToCudf.cpp#4

Open
dan13bauer wants to merge 1 commit intocudf_exchangefrom
cudf_parquet_reader
Open

Add cuDF parquet reader support in PlanBuilder.cpp and ToCudf.cpp#4
dan13bauer wants to merge 1 commit intocudf_exchangefrom
cudf_parquet_reader

Conversation

@dan13bauer
Copy link
Owner

No description provided.

@dan13bauer dan13bauer force-pushed the cudf_parquet_reader branch 3 times, most recently from 1134cc5 to 5228d5d Compare September 1, 2025 13:13
@dan13bauer dan13bauer force-pushed the cudf_parquet_reader branch 4 times, most recently from e9b1235 to be041c8 Compare September 5, 2025 14:48
@dan13bauer dan13bauer force-pushed the cudf_parquet_reader branch 5 times, most recently from 090accb to 516ecfa Compare September 16, 2025 12:51
lga-zurich pushed a commit to lga-zurich/velox-exchange that referenced this pull request Sep 19, 2025
lga-zurich pushed a commit to lga-zurich/velox-exchange that referenced this pull request Sep 19, 2025
@dan13bauer dan13bauer force-pushed the cudf_parquet_reader branch from 64fce39 to d6ed9ef Compare October 8, 2025 15:39
@GregoryKimball GregoryKimball removed this from libcudf Oct 17, 2025
dan13bauer pushed a commit that referenced this pull request Feb 2, 2026
Summary:
Fixes OSS Asan segV due to calling 'as->' on a nullptr.

```
=================================================================
==4058438==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000a563a4 bp 0x7ffd54ee5bc0 sp 0x7ffd54ee5aa0 T0)
==4058438==The signal is caused by a READ memory access.
==4058438==Hint: address points to the zero page.
    #0 0x000000a563a4 in facebook::velox::FlatVector<int>* facebook::velox::BaseVector::as<facebook::velox::FlatVector<int>>() /velox/./velox/vector/BaseVector.h:116:12
    #1 0x000000a563a4 in facebook::velox::test::(anonymous namespace)::FlatMapVectorTest_encodedKeys_Test::TestBody() /velox/velox/vector/tests/FlatMapVectorTest.cpp:156:5
    #2 0x70874f90ce0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #3 0x70874f8ed825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #4 0x70874f8ed9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #5 0x70874f8edaf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #6 0x70874f8fcfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #7 0x70874f8fa7c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #8 0x70877c073153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    #9 0x70874f48460f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #10 0x70874f4846bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #11 0x00000044c1b4 in _start (/velox/_build/debug/velox/vector/tests/velox_vector_test+0x44c1b4) (BuildId: 6da0b0d1074134be8f4d4534e5dbac9eeb9d482b)
```

Reviewed By: peterenescu

Differential Revision: D91275269

fbshipit-source-id: 0806aa7562dc8cf4ad708fc6a8e4b29409507745
dan13bauer pushed a commit that referenced this pull request Feb 2, 2026
Summary:
Pull Request resolved: facebookincubator#16102

Fixes Asan error in S3Util.cpp, See stack trace below:

```
==4125762==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000006114ff at pc 0x70aa17bc0120 bp 0x7ffe905f3030 sp 0x7ffe905f3028
READ of size 1 at 0x0000006114ff thread T0
    #0 0x70aa17bc011f in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>) /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16
    #1 0x00000055790b in facebook::velox::filesystems::S3UtilTest_parseAWSRegion_Test::TestBody() /velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:147:3
    #2 0x70aa2e89be0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #3 0x70aa2e87c825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #4 0x70aa2e87c9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #5 0x70aa2e87caf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #6 0x70aa2e88bfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #7 0x70aa2e8897c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    #8 0x70aa2e8ba153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    #9 0x70aa01ceb60f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #10 0x70aa01ceb6bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    #11 0x000000408684 in _start (/velox/_build/debug/velox/connectors/hive/storage_adapters/s3fs/tests/velox_s3file_test+0x408684) (BuildId: bbf3099c9a66a548c6da234b17ad1b631e9ed649)

0x0000006114ff is located 33 bytes before global variable '.str.135' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:126' (0x000000611520) of size 46
  '.str.135' is ascii string 'isHostExcludedFromProxy(hostname, pair.first)'
0x0000006114ff is located 1 bytes before global variable '.str.133' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:122' (0x000000611500) of size 1
  '.str.133' is ascii string ''
0x0000006114ff is located 42 bytes after global variable '.str.132' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:121' (0x0000006114c0) of size 21
  '.str.132' is ascii string 'localhost,foobar.com'
AddressSanitizer: global-buffer-overflow /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16 in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>)
Shadow bytes around the buggy address:
```

Reviewed By: pedroerp

Differential Revision: D91278230

fbshipit-source-id: 05283bc8408069fa3f5ab8a7840b2bd0835fa7d6
dan13bauer pushed a commit that referenced this pull request Mar 6, 2026
The CUDA stream used to produce data was lost during intra-node
transfer. The source would allocate a new stream from the pool instead
of using the producer's stream, breaking stream ordering guarantees.

Add IntraNodeTransferResult struct to bundle data + stream + atEnd.
Add stream parameter to publish() and stream field to
IntraNodeTransferEntry. Update poll()/waitFor() to return
IntraNodeTransferResult. Add [[nodiscard]] to publish/poll/waitFor.

In CudfExchangeServer, pass rmm::cuda_stream_default to publish()
since data is already synchronized from the producer's stream.sync().

In CudfExchangeSource, use result->stream (the producer's stream)
in onIntraNodeData instead of getting a new stream from the pool.

Review: @wence- comments #3 + #4, @pentschev #12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant