Add GHA coordinator for performance evaluation task scatter/gather#791
Add GHA coordinator for performance evaluation task scatter/gather#791cjonas9 wants to merge 99 commits into
Conversation
# Conflicts: # go.mod # go.sum
# Conflicts: # .github/workflows/load-test.yml # cmd/stellar-rpc/internal/integrationtest/infrastructure/load-test/run-load-test.sh # cmd/stellar-rpc/internal/integrationtest/infrastructure/load-test/runner/orchestrate.go # cmd/stellar-rpc/internal/integrationtest/ingest_loadtest_test.go
# Conflicts: # .github/workflows/load-test.yml # cmd/stellar-rpc/internal/ingest/service.go # cmd/stellar-rpc/internal/integrationtest/ingest_loadtest_test.go # go.mod # go.sum
There was a problem hiding this comment.
Pull request overview
Introduces a GitHub Actions “scatter/gather” coordinator for release performance-evaluation runs, converting the existing ingestion load test into a callable workflow “leg” that reports its results via outputs + an S3 result object, then consolidating leg results into a single sticky PR comment.
Changes:
- Adds a new
load-test-coordinator.ymlworkflow that resolves the release PR context, fans out to callable perf-eval legs, and posts an aggregated sticky PR comment. - Refactors the ingestion load test workflow (
load-test.yml) into aworkflow_call-only leg with structured outputs (bucket/key/verdict/etc.). - Adds a Go-based coordinator comment renderer (
coordinator-runner) plus new apply-load scenario configs and supporting runner tweaks.
Reviewed changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-soroswap.cfg |
Adds v27 Soroswap apply-load profile config for generating ingestible meta corpus. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-sac.cfg |
Adds v27 SAC apply-load profile config (with disjoint classic payment window notes). |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/testdata/apply-load-v27-oz.cfg |
Adds v27 OZ (custom token) apply-load profile config. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/runner_test.go |
Adds unit tests for result encoding/decoding, S3 “not found” detection, and tail buffer behavior. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/orchestrate.go |
Extends leg outputs to include verdict/bucket/key; improves timeout reporting metadata. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/runner/instantiate.go |
Clarifies result-object contract and scenario naming; improves failure-path explanation. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/run-load-test.sh |
Updates bootstrap/runner handoff and adds a self-terminate ceiling; adds (currently always-on) S3 log upload hook. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/coordinator-runner.go |
New tool to render the sticky “Performance Evaluation Test #N” comment by fetching leg results from S3 and folding history. |
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/coordinator-runner_test.go |
Unit tests for numbering/history-folding and leg rendering behavior. |
.gitignore |
Ignores generated .xdr.zstd corpora and a refresh tool build artifact. |
.github/workflows/load-test.yml |
Converts the ingest load test to a callable workflow leg with outputs and artifacts; removes direct PR commenting. |
.github/workflows/load-test-coordinator.yml |
New coordinator workflow: plan → leg fan-out → aggregate/report sticky PR comment. |
.github/workflows/e2e.yml |
Pins the reusable system-test workflow reference to a specific commit SHA. |
Comments suppressed due to low confidence (1)
cmd/stellar-rpc/internal/integrationtest/infrastructure/perf-eval/ingest-load-test/run-load-test.sh:78
- This block is labeled “temporary scaffolding” but is currently always enabled and uploads the full user-data log to S3 on every run. If it’s intended only for debugging, it should be gated behind an opt-in env var (or removed) to avoid unexpected S3 writes and potential log retention concerns.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1b4473315
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| push: | ||
| # temporary scaffolding: before merge to main, replace with line | ||
| # branches: [release/**] | ||
| branches: [release/**, gha-coordinator] |
There was a problem hiding this comment.
Remove the temporary gha-coordinator trigger
With this branch filter left in place, any push to gha-coordinator will run the full release load-test coordinator, including assuming the AWS role and launching the c5.2xlarge leg, even though the workflow is meant to run only for release branches. The adjacent comment says this is temporary before merge; please drop the extra branch before shipping so non-release pushes cannot start expensive perf runs or post PR comments.
Useful? React with 👍 / 👎.
🧪 Performance Evaluation Test #5Commit: ✅ Apply-load ingestion — verdict: ok📈 Ingest load test —
|
| Profile | Ledgers | ms/ledger | p50 / p95 / p99 ms | max ms |
|---|---|---|---|---|
| load-test-ledgers-v27-oz | 1000 | 1197.087 | 1116.440 / 1631.880 / 1908.876 | 2787.808 |
| load-test-ledgers-v27-sac | 1000 | 1098.122 | 1108.570 / 1180.084 / 1235.298 | 1306.027 |
| load-test-ledgers-v27-soroswap | 1000 | 792.490 | 804.127 / 868.934 / 930.758 | 1062.309 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Throughput | 0.94 ledgers/sec |
| Elapsed wall-clock | 3188.469s |
| Ingest busy-time | 3087.699s (96.8% utilization) |
| Per-ledger p50 / p95 / p99 | 1051.843 / 1415.404 / 1742.786 ms |
| Golden DB fetch+decompress | 2435s |
| stellar-core | v27.0.0 |
| Workflow run | #28541434157-1 |
Performance Evaluation Test #4
Commit: c91c61f6e113 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28408420248
✅ Apply-load ingestion — verdict: ok
📈 Ingest load test — c91c61f
| Profile | Ledgers | ms/ledger | p50 / p95 / p99 ms | max ms |
|---|---|---|---|---|
| load-test-ledgers-v27-oz | 1000 | 1194.497 | 1110.690 / 1625.262 / 1905.351 | 3036.549 |
| load-test-ledgers-v27-sac | 1000 | 1094.507 | 1103.723 / 1177.603 / 1239.945 | 1300.516 |
| load-test-ledgers-v27-soroswap | 1000 | 791.392 | 802.718 / 866.760 / 923.818 | 1037.579 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Throughput | 0.94 ledgers/sec |
| Elapsed wall-clock | 3189.754s |
| Ingest busy-time | 3080.396s (96.6% utilization) |
| Per-ledger p50 / p95 / p99 | 1047.597 / 1415.372 / 1741.242 ms |
| Golden DB fetch+decompress | 2456s |
| stellar-core | v27.0.0 |
| Workflow run | #28408420248-1 |
Performance Evaluation Test #3
Commit: d846b52d17c1 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28395013091
✅ Apply-load ingestion — verdict: ok
📈 Ingest load test — d846b52
| Profile | Ledgers | ms/ledger | p50 / p95 / p99 ms | max ms |
|---|---|---|---|---|
| load-test-ledgers-v27-oz | 1000 | 1198.571 | 1117.194 / 1624.044 / 1914.508 | 3391.985 |
| load-test-ledgers-v27-sac | 1000 | 1097.772 | 1107.969 / 1180.118 / 1237.781 | 1300.999 |
| load-test-ledgers-v27-soroswap | 1000 | 792.332 | 802.249 / 866.830 / 923.025 | 1033.783 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Throughput | 0.94 ledgers/sec |
| Elapsed wall-clock | 3188.566s |
| Ingest busy-time | 3088.675s (96.9% utilization) |
| Per-ledger p50 / p95 / p99 | 1051.098 / 1419.916 / 1730.795 ms |
| Golden DB fetch+decompress | 2441s |
| stellar-core | v27.0.0 |
| Workflow run | #28395013091-1 |
Performance Evaluation Test #2
Commit: c601be50f256 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28265430398
✅ Apply-load ingestion — verdict: ok
📈 Ingest load test — c601be5
| Profile | Ledgers | ms/ledger | p50 / p95 / p99 ms | max ms |
|---|---|---|---|---|
| load-test-ledgers-v27-oz | 1000 | 1197.610 | 1114.931 / 1630.206 / 1907.536 | 3407.476 |
| load-test-ledgers-v27-sac | 1000 | 1097.030 | 1106.600 / 1179.565 / 1235.090 | 1304.292 |
| load-test-ledgers-v27-soroswap | 1000 | 791.518 | 802.001 / 867.143 / 922.698 | 1039.465 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Throughput | 0.94 ledgers/sec |
| Elapsed wall-clock | 3189.277s |
| Ingest busy-time | 3086.158s (96.8% utilization) |
| Per-ledger p50 / p95 / p99 | 1050.350 / 1413.260 / 1733.772 ms |
| Golden DB fetch+decompress | 2410s |
| stellar-core | v27.0.0 |
| Workflow run | #28265430398-1 |
Performance Evaluation Test #1
Commit: f1b4473315a8 (gha-coordinator)
Run: https://github.com/stellar/stellar-rpc/actions/runs/28257646240
✅ Apply-load ingestion — verdict: ok
📈 Ingest load test — f1b4473
| Profile | Ledgers | ms/ledger | p50 / p95 / p99 ms | max ms |
|---|---|---|---|---|
| load-test-ledgers-v27-oz | 1000 | 1198.111 | 1116.909 / 1624.761 / 1918.004 | 2807.759 |
| load-test-ledgers-v27-sac | 1000 | 1098.417 | 1108.963 / 1180.559 / 1237.718 | 1301.012 |
| load-test-ledgers-v27-soroswap | 1000 | 792.573 | 803.499 / 868.011 / 923.528 | 1036.343 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Throughput | 0.94 ledgers/sec |
| Elapsed wall-clock | 3188.317s |
| Ingest busy-time | 3089.101s (96.9% utilization) |
| Per-ledger p50 / p95 / p99 | 1052.390 / 1416.426 / 1738.711 ms |
| Golden DB fetch+decompress | 2436s |
| stellar-core | v27.0.0 |
| Workflow run | #28257646240-1 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c601be50f2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| report: | ||
| name: Aggregate + report | ||
| needs: [plan, load-test] | ||
| if: always() |
There was a problem hiding this comment.
Prevent canceled coordinator runs from posting stale reports
In this workflow I checked the top-level concurrency.cancel-in-progress: true block: when a newer push supersedes an in-flight release run while this aggregation job is running, this always() condition lets the canceled run keep executing; GitHub's workflow cancellation reference explicitly notes that jobs whose condition still evaluates true, such as always(), are not canceled. That stale report can still edit the sticky PR comment for the superseded SHA after a newer run has started or even completed, so keep reporting on failed legs but exclude canceled runs before posting (for example by adding !cancelled()).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d846b52d17
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fi | ||
|
|
||
| # Render the new body (fetches each leg's S3 result, folds prior runs). | ||
| printf '%s' "$PREV" | go run "$PERF_EVAL" > /tmp/comment.md |
There was a problem hiding this comment.
Stop when report rendering fails
If this go run exits non-zero (for example due to a transient Go setup/compile failure or renderer error), the script does not stop because the shell only set -uo pipefail and not -e; execution continues with /tmp/comment.md (created by the redirection, possibly empty or partial), so the workflow can warn/post bad report content and still let the final gate pass when the leg outputs say success. Keep the gh api calls tolerant if desired, but make renderer failures abort before posting.
Useful? React with 👍 / 👎.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
What
Adds a GitHub Actions coordinator (
load-test-coordinator.yml) that, on a push to arelease/**branch, launches the release performance-evaluation leg(s) as callable workflows and posts their consolidated results as a sticky comment on the release PR. The apply-load ingest load test (#741) is adapted into a callable leg that reports back (outputs + an S3 result object) instead of commenting itself.Decomposes a great deal of the existing infrastructure into more modular components for future coordinator-driven load test. Of the diff, +870/-721 of it is purely refactoring existing code into a new
harnesspackage, so review of it does not necessarily have to be as thorough as coordinator-heavy sections. Here is the breakdown of that:runner/orchestrate.goandrunner/instantiate.gointo an importableharnesspackage, mechanical changes onlygather.go,s3.go,exec.go,harness.go, andharness_test.gorun-load-test.shand placed intobootstrap-common.shmostly as helper functionsrun-load-test.shis now extremely slim and only serves to actually run the ingestion load testload-test.ymland put inec2-leg.yml, which handles the EC2 lifecycle with no test-specific hardcodingResults of the load tests are commented on a sticky comment on the
releasePR. If >1 push is made/multiple runs are requested, previous run's results are folded into dropdown menus like this:🧪 Performance Evaluation Test #N...
[recent results for run N]Performance Evaluation Test #N-1
`[results for run N-1]`Performance Evaluation Test #...
`[results for run ...]`Performance Evaluation Test # 1
`[results for run 1]`Or, you can just look in the comments below to see how this looks!
The high-level flow of the coordinator is below:
Why
This is so that the remaining perf-eval tests slot in cleanly: each new test becomes another callable leg the coordinator launches and folds into the same report, with no per-leg reporting or permissions plumbing to duplicate.
Known limitations
N/A. Remaining work for the epic this task is a part of mainly includes finishing off the other perf eval tasks (only the ingest load test is complete at this point).