feat: pxf fdw support parallel scan by MisterRaindrop · Pull Request #61 · apache/cloudberry-pxf

MisterRaindrop · 2026-02-10T10:56:40Z

Change logs

Currently, parallel FDW is supported. This implementation depends on the kernel's commit.

The current code is not yet ready for the review stage. This current commit is only an exploratory submission for FDW parallelization. More importantly, I need to ensure that the core part of the kernel is solid first.

apache/cloudberry#1571

Contributor's checklist

Here are some reminders before you submit your pull request:

Make sure that your Pull Request has a clear title and commit message. You can take the Git commit template as a reference.
Learn the code contribution and doc contribution guides for better collaboration.
Make sure that CICD workflow is successful.
List your communications in the GitHub Issues or Discussions (if has or needed).
Feel free to ask for the cloudberry committers or other people to help review and approve.

- fdw support pg parallel scan - add parallel scan correctness tests for PXF

ostinru · 2026-02-11T19:31:23Z

.../pxf-service/src/main/java/org/apache/cloudberry/pxf/service/controller/ReadServiceImpl.java

+                // Parallel mode: only process the specified fragment
+                Fragment specificFragment = fragmenterService.getFragmentByIndex(
+                        context, context.getSpecificFragmentIndex());
+                fragments = java.util.Collections.singletonList(specificFragment);


NIT: import java.util.Collections?

ostinru · 2026-02-11T19:44:22Z

fdw/pxf_bridge.h

+	slock_t		mutex;				/* mutex for accessing shared state */
+	int			total_fragments;	/* total number of fragments */
+	int			next_fragment;		/* next fragment index to be assigned */
+	bool		finished;			/* true if all fragments have been processed */


true if all fragments have been processed... or cancelled?

And. What is a purpose of write-only variable?

MisterRaindrop · 2026-02-12T01:22:56Z

@ostinru
Thanks for the review. The current code is not yet ready for the review stage.

My approach to kernel parallel processing is still too simplistic. Maybe I will change or refactor later.

MisterRaindrop · 2026-02-12T02:35:17Z

All deployments are local.
Sizes: 100MB, 1GB, 10GB
Workers: 4
Format：csv

Size	Rows	Workers	COUNT seq (ms)	COUNT par (ms)	Speedup	SUM seq (ms)	SUM par (ms)	Speedup
100MB x 20file	487,700	4	282	311	0.91x	290	188	1.54x
1GB x 20file	4,994,140	4	2352	1514	1.55x	2448	1314	1.86x
10GB x 20file	49,941,480	4	21524	11589	1.86x	21954	11547	1.90x

When exploring parallelization, the good news is that parallelization does indeed improve efficiency. For small data volumes, the improvement is not obvious and may even be less efficient than non-parallel processing. Only when the data volume is large does it show a noticeable improvement.

However, the current improvement still falls short of the expected level. Theoretically, the speedup factor should be almost equal to the number of workers. The reason it hasn’t reached the expected level may be due to bottlenecks in I/O or CPU. Further exploration will be conducted in the future.

- Introduced virtual segment ID handling for parallel execution in Cloudberry. - Added PxfBridgeImportStartVirtual function to manage imports with virtual segment IDs. - Updated PxfFdwScanState structure to include fields for gang-parallel execution. - Enhanced foreign scan functions to support gang-parallel mode, ensuring unique fragment distribution among workers. - Implemented initialization and cleanup routines for gang-parallel state management.

MisterRaindrop and others added 2 commits February 10, 2026 17:41

feat: pxf fdw support parallel scan

2b609a4

- fdw support pg parallel scan - add parallel scan correctness tests for PXF

Merge branch 'apache:main' into liuxiaoyu/paralle_fdw_2

72ed3e1

MisterRaindrop self-assigned this Feb 10, 2026

ostinru self-requested a review February 10, 2026 13:49

ostinru reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pxf fdw support parallel scan#61

feat: pxf fdw support parallel scan#61
MisterRaindrop wants to merge 3 commits intoapache:mainfrom
MisterRaindrop:liuxiaoyu/paralle_fdw_2

MisterRaindrop commented Feb 10, 2026 •

edited

Loading

Uh oh!

ostinru Feb 11, 2026

Uh oh!

ostinru Feb 11, 2026

Uh oh!

MisterRaindrop commented Feb 12, 2026

Uh oh!

MisterRaindrop commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MisterRaindrop commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change logs

Contributor's checklist

Uh oh!

ostinru Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ostinru Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

MisterRaindrop commented Feb 12, 2026

Uh oh!

MisterRaindrop commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MisterRaindrop commented Feb 10, 2026 •

edited

Loading

MisterRaindrop commented Feb 12, 2026 •

edited

Loading