feat: pxf fdw support parallel scan#61
Conversation
- fdw support pg parallel scan - add parallel scan correctness tests for PXF
| // Parallel mode: only process the specified fragment | ||
| Fragment specificFragment = fragmenterService.getFragmentByIndex( | ||
| context, context.getSpecificFragmentIndex()); | ||
| fragments = java.util.Collections.singletonList(specificFragment); |
There was a problem hiding this comment.
NIT: import java.util.Collections?
| slock_t mutex; /* mutex for accessing shared state */ | ||
| int total_fragments; /* total number of fragments */ | ||
| int next_fragment; /* next fragment index to be assigned */ | ||
| bool finished; /* true if all fragments have been processed */ |
There was a problem hiding this comment.
true if all fragments have been processed... or cancelled?
And. What is a purpose of write-only variable?
|
@ostinru My approach to kernel parallel processing is still too simplistic. Maybe I will change or refactor later. |
|
All deployments are local.
When exploring parallelization, the good news is that parallelization does indeed improve efficiency. For small data volumes, the improvement is not obvious and may even be less efficient than non-parallel processing. Only when the data volume is large does it show a noticeable improvement. However, the current improvement still falls short of the expected level. Theoretically, the speedup factor should be almost equal to the number of workers. The reason it hasn’t reached the expected level may be due to bottlenecks in I/O or CPU. Further exploration will be conducted in the future. |
- Introduced virtual segment ID handling for parallel execution in Cloudberry. - Added PxfBridgeImportStartVirtual function to manage imports with virtual segment IDs. - Updated PxfFdwScanState structure to include fields for gang-parallel execution. - Enhanced foreign scan functions to support gang-parallel mode, ensuring unique fragment distribution among workers. - Implemented initialization and cleanup routines for gang-parallel state management.
#58
Change logs
Currently, parallel FDW is supported. This implementation depends on the kernel's commit.
The current code is not yet ready for the review stage. This current commit is only an exploratory submission for FDW parallelization. More importantly, I need to ensure that the core part of the kernel is solid first.
apache/cloudberry#1571
Contributor's checklist
Here are some reminders before you submit your pull request: