Skip to content

Feature/357 add missing sedona node counts 2 12 16 to benchmark matrix#358

Merged
jathavaan merged 9 commits into
mainfrom
feature/357-add-missing-sedona-node-counts-2-12-16-to-benchmark-matrix
May 26, 2026
Merged

Feature/357 add missing sedona node counts 2 12 16 to benchmark matrix#358
jathavaan merged 9 commits into
mainfrom
feature/357-add-missing-sedona-node-counts-2-12-16-to-benchmark-matrix

Conversation

@jathavaan
Copy link
Copy Markdown
Collaborator

This pull request significantly expands the national-scale spatial join experiment matrix to improve the coverage of scalability tests, especially for the Sedona engine's broadcast and partitioned strategies. The main changes are the addition of new experiment configurations at 2-node and 12-node cluster sizes, updates to batch groupings, and corresponding documentation updates.

Experiment matrix expansion:

  • Added new Sedona broadcast and partitioned experiments at 2-node and 12-node cluster sizes across all dataset sizes (small, medium, large) in benchmarks.yml. This increases the RQ2 experiment count from 25 to 36 and provides a more granular scaling curve. [1] [2] [3] [4]
  • Updated benchmark_runner.py to include and dispatch the new 12-node experiment scripts for both broadcast and partitioned strategies. [1] [2] [3]

Batch and related script updates:

  • Updated batch definitions and related_script_ids in benchmarks.yml to include the new 2-node and 12-node experiments, ensuring correct parallel execution and result grouping. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Documentation updates:

  • Updated README.md to reflect the new total number of experiments and batches, the expanded node counts for Sedona strategies, and the revised test matrix. The documentation now accurately describes the new experiment configurations and their grouping. [1] [2] [3] [4]

These changes collectively improve the experiment coverage and scalability analysis for the national-scale spatial join workload, especially for larger cluster sizes and finer scaling increments.

@jathavaan jathavaan self-assigned this May 26, 2026
Copilot AI review requested due to automatic review settings May 26, 2026 14:09
@jathavaan jathavaan linked an issue May 26, 2026 that may be closed by this pull request
@jathavaan jathavaan enabled auto-merge May 26, 2026 14:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the national-scale spatial join (RQ2) benchmark matrix to add missing Sedona scalability points—especially 12-node configurations—and updates orchestration/documentation so these experiments can be scheduled and dispatched consistently.

Changes:

  • Added new Databricks/Sedona 12-worker entrypoints (broadcast + partitioned) and wired them into benchmark_runner.py dispatch.
  • Expanded benchmarks.yml to include additional 2-/12-/16-node experiment IDs across small/medium/large, with updated related_script_ids batch groupings.
  • Updated docker-compose.yml and README.md to reflect the expanded experiment matrix and batch counts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_12_nodes.py New 12-worker partitioned Databricks entrypoint for the national-scale spatial join benchmark.
src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_12_nodes.py New 12-worker broadcast Databricks entrypoint for the national-scale spatial join benchmark.
src/presentation/entrypoints/init.py Exposes the new 12-node entrypoints for import/dispatch.
benchmark_runner.py Adds dispatch cases for the new 12-node broadcast/partitioned script IDs.
benchmarks.yml Expands the RQ2 experiment definitions and updates batching via related_script_ids.
docker-compose.yml Adds local build/run service stanzas for the 12-node Databricks benchmark images.
README.md Updates experiment/batch counts and the documented RQ2 scaling matrix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker-compose.yml
Comment thread benchmarks.yml
@jathavaan jathavaan disabled auto-merge May 26, 2026 14:20
@jathavaan jathavaan merged commit 4091739 into main May 26, 2026
2 checks passed
@jathavaan jathavaan deleted the feature/357-add-missing-sedona-node-counts-2-12-16-to-benchmark-matrix branch May 26, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing Sedona node counts (2, 12, 16) to benchmark matrix

2 participants