Skip to content

Add GHA coordinator for performance evaluation task scatter/gather#791

Open
cjonas9 wants to merge 99 commits into
mainfrom
gha-coordinator
Open

Add GHA coordinator for performance evaluation task scatter/gather#791
cjonas9 wants to merge 99 commits into
mainfrom
gha-coordinator

Conversation

@cjonas9

@cjonas9 cjonas9 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Adds a GitHub Actions coordinator (load-test-coordinator.yml) that, on a push to a release/** branch, launches the release performance-evaluation leg(s) as callable workflows and posts their consolidated results as a sticky comment on the release PR. The apply-load ingest load test (#741) is adapted into a callable leg that reports back (outputs + an S3 result object) instead of commenting itself.

Decomposes a great deal of the existing infrastructure into more modular components for future coordinator-driven load test. Of the diff, +870/-721 of it is purely refactoring existing code into a new harness package, so review of it does not necessarily have to be as thorough as coordinator-heavy sections. Here is the breakdown of that:

  • Generic code lifted out of runner/orchestrate.go and runner/instantiate.go into an importable harness package, mechanical changes only
    • package consists of gather.go, s3.go, exec.go, harness.go, and harness_test.go
  • Generic code lifted out of run-load-test.sh and placed into bootstrap-common.sh mostly as helper functions
    • run-load-test.sh is now extremely slim and only serves to actually run the ingestion load test
  • Generic code lifted out of load-test.yml and put in ec2-leg.yml, which handles the EC2 lifecycle with no test-specific hardcoding

Results of the load tests are commented on a sticky comment on the release PR. If >1 push is made/multiple runs are requested, previous run's results are folded into dropdown menus like this:


🧪 Performance Evaluation Test #N...
[recent results for run N]

Performance Evaluation Test #N-1 `[results for run N-1]`
Performance Evaluation Test #... `[results for run ...]`
Performance Evaluation Test # 1 `[results for run 1]`

Or, you can just look in the comments below to see how this looks!

The high-level flow of the coordinator is below:

flowchart TB
  trigger["push to release/** "] --> plan["perf-eval-coordinator.yml<br/>- plan<br/>- resolve ref + PR"]
  plan --> lt["load-test.yml<br/>runs synthetic ledger ingestion test"]
  plan -.-> future["future legs"]
  lt --> report["report<br/>aggregate → sticky PR comment"]
  future -.-> report
Loading

Why

This is so that the remaining perf-eval tests slot in cleanly: each new test becomes another callable leg the coordinator launches and folds into the same report, with no per-leg reporting or permissions plumbing to duplicate.

Known limitations

N/A. Remaining work for the epic this task is a part of mainly includes finishing off the other perf eval tasks (only the ingest load test is complete at this point).

cjonas9 added 30 commits May 8, 2026 21:32
Base automatically changed from apply-load to main June 26, 2026 17:37
cjonas9 added 2 commits June 26, 2026 14:20
# Conflicts:
#	.github/workflows/load-test.yml
#	cmd/stellar-rpc/internal/ingest/service.go
#	cmd/stellar-rpc/internal/integrationtest/ingest_loadtest_test.go
#	go.mod
#	go.sum
@cjonas9 cjonas9 marked this pull request as ready for review June 26, 2026 18:37
Copilot AI review requested due to automatic review settings June 26, 2026 18:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a GitHub Actions “scatter/gather” coordinator for release performance-evaluation runs, converting the existing ingestion load test into a callable workflow “leg” that reports its results via outputs + an S3 result object, then consolidating leg results into a single sticky PR comment.

Changes:

  • Adds a new load-test-coordinator.yml workflow that resolves the release PR context, fans out to callable perf-eval legs, and posts an aggregated sticky PR comment.
  • Refactors the ingestion load test workflow (load-test.yml) into a workflow_call-only leg with structured outputs (bucket/key/verdict/etc.).
  • Adds a Go-based coordinator comment renderer (coordinator-runner) plus new apply-load scenario configs and supporting runner tweaks.

Reviewed changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-soroswap.cfg Adds v27 Soroswap apply-load profile config for generating ingestible meta corpus.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-sac.cfg Adds v27 SAC apply-load profile config (with disjoint classic payment window notes).
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-oz.cfg Adds v27 OZ (custom token) apply-load profile config.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/runner_test.go Adds unit tests for result encoding/decoding, S3 “not found” detection, and tail buffer behavior.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/orchestrate.go Extends leg outputs to include verdict/bucket/key; improves timeout reporting metadata.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/instantiate.go Clarifies result-object contract and scenario naming; improves failure-path explanation.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/run-load-test.sh Updates bootstrap/runner handoff and adds a self-terminate ceiling; adds (currently always-on) S3 log upload hook.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/coordinator-runner.go New tool to render the sticky “Performance Evaluation Test #N” comment by fetching leg results from S3 and folding history.
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/coordinator-runner_test.go Unit tests for numbering/history-folding and leg rendering behavior.
.gitignore Ignores generated .xdr.zstd corpora and a refresh tool build artifact.
.github/workflows/load-test.yml Converts the ingest load test to a callable workflow leg with outputs and artifacts; removes direct PR commenting.
.github/workflows/load-test-coordinator.yml New coordinator workflow: plan → leg fan-out → aggregate/report sticky PR comment.
.github/workflows/e2e.yml Pins the reusable system-test workflow reference to a specific commit SHA.
Comments suppressed due to low confidence (1)

cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/run-load-test.sh:78

  • This block is labeled “temporary scaffolding” but is currently always enabled and uploads the full user-data log to S3 on every run. If it’s intended only for debugging, it should be gated behind an opt-in env var (or removed) to avoid unexpected S3 writes and potential log retention concerns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/load-test.yml Outdated
Comment thread .github/workflows/load-test-coordinator.yml

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1b4473315

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/ec2-leg.yml
push:
# temporary scaffolding: before merge to main, replace with line
# branches: [release/**]
branches: [release/**, gha-coordinator]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove the temporary gha-coordinator trigger

With this branch filter left in place, any push to gha-coordinator will run the full release load-test coordinator, including assuming the AWS role and launching the c5.2xlarge leg, even though the workflow is meant to run only for release branches. The adjacent comment says this is temporary before merge; please drop the extra branch before shipping so non-release pushes cannot start expensive perf runs or post PR comments.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🧪 Performance Evaluation Test #5

Commit: 4ced1a8834be (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28541434157

✅ Apply-load ingestion — verdict: ok

📈 Ingest load test — 4ced1a8

Profile Ledgers ms/ledger p50 / p95 / p99 ms max ms
load-test-ledgers-v27-oz 1000 1197.087 1116.440 / 1631.880 / 1908.876 2787.808
load-test-ledgers-v27-sac 1000 1098.122 1108.570 / 1180.084 / 1235.298 1306.027
load-test-ledgers-v27-soroswap 1000 792.490 804.127 / 868.934 / 930.758 1062.309
Metric Value
Ledgers replayed 3000
Initial DB ledger count 120960
Throughput 0.94 ledgers/sec
Elapsed wall-clock 3188.469s
Ingest busy-time 3087.699s (96.8% utilization)
Per-ledger p50 / p95 / p99 1051.843 / 1415.404 / 1742.786 ms
Golden DB fetch+decompress 2435s
stellar-core v27.0.0
Workflow run #28541434157-1
Performance Evaluation Test #4

Commit: c91c61f6e113 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28408420248

✅ Apply-load ingestion — verdict: ok

📈 Ingest load test — c91c61f

Profile Ledgers ms/ledger p50 / p95 / p99 ms max ms
load-test-ledgers-v27-oz 1000 1194.497 1110.690 / 1625.262 / 1905.351 3036.549
load-test-ledgers-v27-sac 1000 1094.507 1103.723 / 1177.603 / 1239.945 1300.516
load-test-ledgers-v27-soroswap 1000 791.392 802.718 / 866.760 / 923.818 1037.579
Metric Value
Ledgers replayed 3000
Initial DB ledger count 120960
Throughput 0.94 ledgers/sec
Elapsed wall-clock 3189.754s
Ingest busy-time 3080.396s (96.6% utilization)
Per-ledger p50 / p95 / p99 1047.597 / 1415.372 / 1741.242 ms
Golden DB fetch+decompress 2456s
stellar-core v27.0.0
Workflow run #28408420248-1
Performance Evaluation Test #3

Commit: d846b52d17c1 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28395013091

✅ Apply-load ingestion — verdict: ok

📈 Ingest load test — d846b52

Profile Ledgers ms/ledger p50 / p95 / p99 ms max ms
load-test-ledgers-v27-oz 1000 1198.571 1117.194 / 1624.044 / 1914.508 3391.985
load-test-ledgers-v27-sac 1000 1097.772 1107.969 / 1180.118 / 1237.781 1300.999
load-test-ledgers-v27-soroswap 1000 792.332 802.249 / 866.830 / 923.025 1033.783
Metric Value
Ledgers replayed 3000
Initial DB ledger count 120960
Throughput 0.94 ledgers/sec
Elapsed wall-clock 3188.566s
Ingest busy-time 3088.675s (96.9% utilization)
Per-ledger p50 / p95 / p99 1051.098 / 1419.916 / 1730.795 ms
Golden DB fetch+decompress 2441s
stellar-core v27.0.0
Workflow run #28395013091-1
Performance Evaluation Test #2

Commit: c601be50f256 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28265430398

✅ Apply-load ingestion — verdict: ok

📈 Ingest load test — c601be5

Profile Ledgers ms/ledger p50 / p95 / p99 ms max ms
load-test-ledgers-v27-oz 1000 1197.610 1114.931 / 1630.206 / 1907.536 3407.476
load-test-ledgers-v27-sac 1000 1097.030 1106.600 / 1179.565 / 1235.090 1304.292
load-test-ledgers-v27-soroswap 1000 791.518 802.001 / 867.143 / 922.698 1039.465
Metric Value
Ledgers replayed 3000
Initial DB ledger count 120960
Throughput 0.94 ledgers/sec
Elapsed wall-clock 3189.277s
Ingest busy-time 3086.158s (96.8% utilization)
Per-ledger p50 / p95 / p99 1050.350 / 1413.260 / 1733.772 ms
Golden DB fetch+decompress 2410s
stellar-core v27.0.0
Workflow run #28265430398-1
Performance Evaluation Test #1

Commit: f1b4473315a8 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28257646240

✅ Apply-load ingestion — verdict: ok

📈 Ingest load test — f1b4473

Profile Ledgers ms/ledger p50 / p95 / p99 ms max ms
load-test-ledgers-v27-oz 1000 1198.111 1116.909 / 1624.761 / 1918.004 2807.759
load-test-ledgers-v27-sac 1000 1098.417 1108.963 / 1180.559 / 1237.718 1301.012
load-test-ledgers-v27-soroswap 1000 792.573 803.499 / 868.011 / 923.528 1036.343
Metric Value
Ledgers replayed 3000
Initial DB ledger count 120960
Throughput 0.94 ledgers/sec
Elapsed wall-clock 3188.317s
Ingest busy-time 3089.101s (96.9% utilization)
Per-ledger p50 / p95 / p99 1052.390 / 1416.426 / 1738.711 ms
Golden DB fetch+decompress 2436s
stellar-core v27.0.0
Workflow run #28257646240-1

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c601be50f2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

report:
name: Aggregate + report
needs: [plan, load-test]
if: always()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent canceled coordinator runs from posting stale reports

In this workflow I checked the top-level concurrency.cancel-in-progress: true block: when a newer push supersedes an in-flight release run while this aggregation job is running, this always() condition lets the canceled run keep executing; GitHub's workflow cancellation reference explicitly notes that jobs whose condition still evaluates true, such as always(), are not canceled. That stale report can still edit the sticky PR comment for the superseded SHA after a newer run has started or even completed, so keep reporting on failed legs but exclude canceled runs before posting (for example by adding !cancelled()).

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d846b52d17

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

fi

# Render the new body (fetches each leg's S3 result, folds prior runs).
printf '%s' "$PREV" | go run "$PERF_EVAL" > /tmp/comment.md

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop when report rendering fails

If this go run exits non-zero (for example due to a transient Go setup/compile failure or renderer error), the script does not stop because the shell only set -uo pipefail and not -e; execution continues with /tmp/comment.md (created by the redirection, possibly empty or partial), so the workflow can warn/post bad report content and still let the final gate pass when the leg outputs say success. Keep the gh api calls tolerant if desired, but make renderer failures abort before posting.

Useful? React with 👍 / 👎.

@cjonas9 cjonas9 linked an issue Jun 30, 2026 that may be closed by this pull request
@cjonas9 cjonas9 added this to the platform sprint 73 milestone Jun 30, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release Eval: Add the coordinator GitHub Action

2 participants