Skip to content

Fix initial load completion tracking race condition#6711

Open
lawofcycles wants to merge 2 commits intoopensearch-project:mainfrom
lawofcycles:fix/iceberg-initial-load-completion-race
Open

Fix initial load completion tracking race condition#6711
lawofcycles wants to merge 2 commits intoopensearch-project:mainfrom
lawofcycles:fix/iceberg-initial-load-completion-race

Conversation

@lawofcycles
Copy link
Copy Markdown
Contributor

Description

Move the completion tracking GlobalState creation to before the task partition loop in performInitialLoad(). Previously it was created after all partitions, so workers that finished before the key existed would silently lose their completion increments, preventing the initial load from completing.

Issues Resolved

Resolves #6686

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Copy Markdown
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lawofcycles Is there a way to write a unit test that is fixed by this change?

@lawofcycles
Copy link
Copy Markdown
Contributor Author

@dlvenable Yes, I added a unit test. It uses Mockito's InOrder to verify that the GlobalState (completion tracking) partition is created before any InitialLoadTaskPartition.

@lawofcycles lawofcycles force-pushed the fix/iceberg-initial-load-completion-race branch from ab8ad83 to 4aa9493 Compare May 6, 2026 22:24
creation loop in performInitialLoad(). Previously, the completion key
was created after all partitions, allowing workers to finish and call
incrementSnapshotCompletionCount() before the key existed. Those
increments were silently lost, causing waitForSnapshotComplete() to
never reach the expected total.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Verify that GlobalState (completion tracking) is created before
InitialLoadTaskPartition, ensuring workers can report completion
as soon as they acquire a partition.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles lawofcycles force-pushed the fix/iceberg-initial-load-completion-race branch from 4aa9493 to 540a1c0 Compare May 7, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Iceberg source initial load completion detection fails due to race condition between Leader and Worker

2 participants