Skip to content

#341 Reduce Databricks runs#344

Merged
jathavaan merged 4 commits into
mainfrom
feature/341-reduce-databricks-runs
May 24, 2026
Merged

#341 Reduce Databricks runs#344
jathavaan merged 4 commits into
mainfrom
feature/341-reduce-databricks-runs

Conversation

@jathavaan
Copy link
Copy Markdown
Collaborator

Summary

  • Remove all 12-node Databricks configs (broadcast + partitioned at large tier) — entrypoints, dispatch arms, benchmarks.yml, docker-compose, CI matrices
  • Rebatch orphaned 4-node-large peers into existing {2, 16}-node batches (3-way, vCPU sum = 100 ≤ 200)
  • Add 75-minute cumulative wall-clock ceiling (BENCHMARK_MAX_FIXED_WINDOW_SECONDS) for fixed-iteration benchmarks to cap worst-case cost
  • Clean up stale default-strategy pycache and historical comments in benchmarks.yml
  • Post thesis reconciliation findings to Ch. 4 (Ch. 4: Research Design and Methodology — Notes #314) and Ch. 5 (Ch. 5: System Architecture and Implementation — Notes #315) discussions

Closes #341

Test plan

  • python -m compileall src main.py benchmark_runner.py passes
  • All related_script_ids resolve bidirectionally
  • No dangling 12-node or default references in key files
  • CI docker builds pass for remaining benchmark images

jathavaan added 3 commits May 24, 2026 12:32
Drop broadcast-12-nodes-large and partitioned-12-nodes-large from
benchmarks.yml, their entrypoint files, dispatch arms, docker-compose
services, and CI matrix rows. Fold the orphaned 4-node-large entries
into the existing {2, 16}-node batches (3-way, vCPU sum = 100).
Introduce BENCHMARK_MAX_FIXED_WINDOW_SECONDS (4500s) and check it in
the use_sequential_stopping=False branch of @monitor so a pathological
iteration cannot convert directly into unbounded billed node-hours.
Copilot AI review requested due to automatic review settings May 24, 2026 10:33
@jathavaan jathavaan linked an issue May 24, 2026 that may be closed by this pull request
6 tasks
@jathavaan jathavaan enabled auto-merge May 24, 2026 10:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jathavaan jathavaan merged commit 00fdefd into main May 24, 2026
23 checks passed
@jathavaan jathavaan deleted the feature/341-reduce-databricks-runs branch May 24, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce databricks runs

2 participants