Feature/357 add missing sedona node counts 2 12 16 to benchmark matrix#358
Merged
jathavaan merged 9 commits intoMay 26, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Expands the national-scale spatial join (RQ2) benchmark matrix to add missing Sedona scalability points—especially 12-node configurations—and updates orchestration/documentation so these experiments can be scheduled and dispatched consistently.
Changes:
- Added new Databricks/Sedona 12-worker entrypoints (broadcast + partitioned) and wired them into
benchmark_runner.pydispatch. - Expanded
benchmarks.ymlto include additional 2-/12-/16-node experiment IDs acrosssmall/medium/large, with updatedrelated_script_idsbatch groupings. - Updated
docker-compose.ymlandREADME.mdto reflect the expanded experiment matrix and batch counts.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_12_nodes.py | New 12-worker partitioned Databricks entrypoint for the national-scale spatial join benchmark. |
| src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_12_nodes.py | New 12-worker broadcast Databricks entrypoint for the national-scale spatial join benchmark. |
| src/presentation/entrypoints/init.py | Exposes the new 12-node entrypoints for import/dispatch. |
| benchmark_runner.py | Adds dispatch cases for the new 12-node broadcast/partitioned script IDs. |
| benchmarks.yml | Expands the RQ2 experiment definitions and updates batching via related_script_ids. |
| docker-compose.yml | Adds local build/run service stanzas for the 12-node Databricks benchmark images. |
| README.md | Updates experiment/batch counts and the documented RQ2 scaling matrix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request significantly expands the national-scale spatial join experiment matrix to improve the coverage of scalability tests, especially for the Sedona engine's broadcast and partitioned strategies. The main changes are the addition of new experiment configurations at 2-node and 12-node cluster sizes, updates to batch groupings, and corresponding documentation updates.
Experiment matrix expansion:
small,medium,large) inbenchmarks.yml. This increases the RQ2 experiment count from 25 to 36 and provides a more granular scaling curve. [1] [2] [3] [4]benchmark_runner.pyto include and dispatch the new 12-node experiment scripts for both broadcast and partitioned strategies. [1] [2] [3]Batch and related script updates:
related_script_idsinbenchmarks.ymlto include the new 2-node and 12-node experiments, ensuring correct parallel execution and result grouping. [1] [2] [3] [4] [5] [6] [7] [8] [9]Documentation updates:
README.mdto reflect the new total number of experiments and batches, the expanded node counts for Sedona strategies, and the revised test matrix. The documentation now accurately describes the new experiment configurations and their grouping. [1] [2] [3] [4]These changes collectively improve the experiment coverage and scalability analysis for the national-scale spatial join workload, especially for larger cluster sizes and finer scaling increments.