-
Notifications
You must be signed in to change notification settings - Fork 61
Add RPC ingestion load test driven by synthetic apply-load ledger bundles #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
86 commits
Select commit
Hold shift + click to select a range
3392f09
pull initial work from branch load-testing
cjonas9 ffe27bc
add ledger generation test adapted for RPC
cjonas9 65da3b5
add apply load config
cjonas9 34f086d
add generated ledger output to infrastructure/testdata/
cjonas9 80c982d
add basic ingestion of synthetic ledgers phase
cjonas9 94c69b7
disable debug logs for load test for timeout reasons
cjonas9 f4a16f9
add functions for snapshotting + restoring test DB
cjonas9 1d53b96
improve ad restructure db restoration helpers/API
cjonas9 baf2255
finish DB restoration logic flow and wiring
cjonas9 1647464
skip migrations/fee-stats in load test mode
cjonas9 2f14765
ingest test: refactor, minor semantic fixes
cjonas9 d7c90a9
test.go: add retention window to config, fix fake history archive for…
cjonas9 0390757
minor db restore/trim helper fixes
cjonas9 e0a86e7
rename restore backed-up ledgers function for accuracy
cjonas9 f151a35
refactor, add env vars, change DB helpers to take sequences
cjonas9 bffb101
remove db restoration functionality
cjonas9 e04d51d
add performance metrics json emission functionality
cjonas9 bd8c784
migrate to polling getHealth, change ingest test limits to 1000 ledgers
cjonas9 c7bc001
remove ledger fixtures
cjonas9 786423d
add workflow and script
cjonas9 1606829
fix yaml referencing wrong path for script
cjonas9 7d41b1a
fix yml parsing indentation bug
cjonas9 b701108
use head-object for metadata rather than tags
cjonas9 b1cec1d
refine workflow + instance script
cjonas9 b9ef27e
add apply load cfg
cjonas9 73df1e7
testing: on-push runs
cjonas9 241bdf8
minor yml syntax fixes
cjonas9 1161e3f
set test e2e.yml + add debugging info from instance to ssm
cjonas9 47437f4
skip e2e.yml for testing, add retry loop for root volume lookup
cjonas9 008f327
build-libs over build-stellar-rpc to cut time back
cjonas9 9441749
further slim build phase with no-install-recommends
cjonas9 36c0a82
make instance script best-effort if head-object or stellar-core versi…
cjonas9 4535977
temporarily modify script to work on scratch dev box; increase timeout
cjonas9 54f0b41
fix cfg path error and run ID regression
cjonas9 44e093c
improve error logging
cjonas9 1ce515e
fix error logging wrapper
cjonas9 22a8e6e
patch premature exit due to err trap bug
cjonas9 e925d3c
fix empty GOPATH/GOMODCACHE
cjonas9 ffb3299
updated apply-load config for specific core on runner
cjonas9 bae2fdc
fix version check if warning prints
cjonas9 70eb9dc
bump all timeouts to >= 2 hours
cjonas9 705edec
increase ingest phase timeout
cjonas9 c697510
extend aws role lifetime
cjonas9 caec3f4
undo session time limit increase, use pre-generated synthetic ledgers
cjonas9 b32181d
require confirmed gp3 throttling, extend GHA AWS session to 4 hours
cjonas9 3a0571f
patch logic for throttling behavior
cjonas9 2c4a004
slim needless instance bootstrapping work
cjonas9 d620450
refactor ephemeral load test runner
cjonas9 41529f4
fix minor refactor false sha-verify failure
cjonas9 2a21c73
add support for multiple ledger profiles/files
cjonas9 04f0fe1
change log level to warn, decrease each soroban scenario meta to 1000…
cjonas9 02556cc
refactored duplicate/messy code
cjonas9 807ca42
stop grepping to determine success status, make GHA->ec2 parameter pa…
cjonas9 af9f7b9
split offline ledger generation out of the ingest benchmark
cjonas9 e4180dd
refactor and remove old apply load cfg
cjonas9 4c3394c
Merge remote-tracking branch 'origin/main' into apply-load
cjonas9 bc2e490
Merge branch 'main' into apply-load
cjonas9 555483e
Merge branch 'apply-load' of https://github.com/stellar/stellar-rpc i…
cjonas9 a620cdc
fix linter errors
cjonas9 65cb6ff
update go version
cjonas9 4aca1ef
update default ledger bundle/config to existent ones for test
cjonas9 3d61ceb
decompose ec2 script into go programs
cjonas9 9bef115
reduce comment verbosity, minor clean up
cjonas9 8b39ed5
install jq on load-test box for build-libs; surface build-libs errors
cjonas9 714b1fa
throttle load-test benchmark via cgroup io.max instead of EBS ModifyV…
cjonas9 d9987f6
drop accidentally-committed refresh tooling and orphaned apply-load.cfg
cjonas9 5058ad3
fix linter errors in load-test runner
cjonas9 8cf9f6b
make load-test ingest frequency and ledger count configurable via env
cjonas9 3b33626
EXPERIMENT: run load-test benchmark un-throttled at volume-provisione…
cjonas9 7e60661
use SDK's maxLedgersPerFile ceiling and multiple-bundle functionality
cjonas9 fdb1926
simplify verification walk
cjonas9 1f974a4
simplify and refactor, add ledger ingest stall timeout
cjonas9 19a3115
simplify correctness check, delegate apply-load config work to SDK
cjonas9 103b648
remove ingest phase dependence on unnecessary supplied apply load con…
cjonas9 7e13bf3
bump go sdk to v0.6.x
cjonas9 8333974
bump go sdk again, merge SDK main into loadtest-patch
cjonas9 f3f3d58
Merge remote-tracking branch 'origin/main' into apply-load
cjonas9 a6241f6
update comments in light of changes
cjonas9 8d5f523
rework handshake into instance->s3 results push, fix minor leaks + OO…
cjonas9 21b8045
replace polling-based ledger timing computation with daemon hook
cjonas9 a8931fe
remove trigger-on-push behavior
cjonas9 6666e63
bump SDK to latest commit to main
cjonas9 53557d4
Merge branch 'main' into apply-load
cjonas9 c9ce090
e2e: restore system-test workflow ref to @master
cjonas9 ff3bfef
cleaned up comments for readability
cjonas9 c29fcde
Merge branch 'main' into apply-load
cjonas9 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,4 +2,4 @@ target/ | |
| storage/ | ||
| .soroban/ | ||
| .cargo/ | ||
| .cargo-husky/ | ||
| .cargo-husky/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,216 @@ | ||
| name: Load test (ephemeral) | ||
| # Launches a c5.2xlarge in Horizon (203618453975), polls S3 for the result object | ||
| # the box publishes, posts results to the PR, terminates. Box bootstrap lives in | ||
| # run-load-test.sh; runner-side polling in runner/orchestrate.go. | ||
|
|
||
| on: | ||
| workflow_call: | ||
|
|
||
| permissions: | ||
| id-token: write # for OIDC AssumeRole into the GHA role | ||
| contents: read | ||
| pull-requests: write | ||
|
|
||
| jobs: | ||
| load-test: | ||
| name: Launch + await ephemeral load-test box | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 225 # 210min results wait + buffer for boot/SSM/poll latency and cleanup (role lasts 240min) | ||
| env: | ||
| AWS_REGION: us-east-1 | ||
| INSTANCE_TYPE: c5.2xlarge | ||
| ROOT_VOLUME_GB: 500 | ||
| BOOTSTRAP_VOLUME_IOPS: 3000 | ||
| # 3000 IOPS is the gp3 floor; 125 MiB/s alone would need only 500. | ||
| BOOTSTRAP_VOLUME_THROUGHPUT: 125 | ||
| INSTANCE_PROFILE: stellar-rpc-ci-load-test | ||
| TEST_TAG_KEY: test | ||
| TEST_TAG_VAL: stellar-rpc-ci-load-test | ||
| SSM_REGISTRATION_TIMEOUT: 240 # SSM agent registers ~30-90s after boot | ||
| RESULTS_TIMEOUT: 12600 # 210 min wait for /tmp/done: ~55m bootstrap+build + ~90m benchmark, under the 170m go-test budget. | ||
| POLL_INTERVAL: 30 | ||
| DEBUG_LOG_LINES: 40 | ||
| DEBUG_LOG_EVERY_POLLS: 5 | ||
| LOAD_TEST_DIR: cmd/stellar-rpc/internal/integrationtest/infrastructure/load-test | ||
|
|
||
| steps: | ||
| - name: Resolve target context | ||
| id: target | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| PR_NUMBER=$(gh pr list \ | ||
| --repo "${{ github.repository }}" \ | ||
| --state open \ | ||
| --base main \ | ||
| --head "${{ github.ref_name }}" \ | ||
| --json number \ | ||
| --jq '.[0].number // ""' 2>/dev/null || true) | ||
|
|
||
| RUN_LABEL="${PR_NUMBER:+pr$PR_NUMBER}" | ||
| { | ||
| echo "pr_number=$PR_NUMBER" | ||
| echo "pr_tag_value=${PR_NUMBER:-none}" | ||
| echo "run_label=${RUN_LABEL:-${{ github.ref_name }}}" | ||
| } >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Checkout target ref | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| ref: ${{ github.sha }} | ||
|
|
||
| # The runner-side half is `go run ... runner orchestrate`. | ||
| - uses: ./.github/actions/setup-go | ||
|
|
||
| - name: Configure AWS via OIDC | ||
| uses: aws-actions/configure-aws-credentials@v4 | ||
| with: | ||
| role-to-assume: ${{ secrets.AWS_GHA_ROLE_ARN }} | ||
| aws-region: ${{ env.AWS_REGION }} | ||
| role-duration-seconds: 14400 | ||
|
|
||
| - name: Resolve latest Ubuntu 22.04 AMI | ||
| id: ami | ||
| run: | | ||
| AMI=$(aws ec2 describe-images \ | ||
| --owners 099720109477 \ | ||
| --filters \ | ||
| "Name=name,Values=ubuntu/images/hvm-ssd*/ubuntu-jammy-22.04-amd64-server-*" \ | ||
| "Name=architecture,Values=x86_64" \ | ||
| "Name=state,Values=available" \ | ||
| --query 'sort_by(Images, &CreationDate)[-1].ImageId' \ | ||
| --output text) | ||
| echo "ami=$AMI" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Render user-data | ||
| # The script ships verbatim; parameters travel in a two-line preamble | ||
| # so the bytes that run on the box match the bytes in git. | ||
| run: | | ||
| { | ||
| echo '#!/usr/bin/env bash' | ||
| echo 'export TARGET_SHA=${{ github.sha }} RUN_ID=${{ github.run_id }}-${{ github.run_attempt }}' | ||
| echo 'export BUCKET=stellar-rpc-ci-load-test RESULT_KEY=runs/${{ github.run_id }}-${{ github.run_attempt }}/result.json' | ||
| cat "$LOAD_TEST_DIR/run-load-test.sh" | ||
| } > /tmp/user-data.sh | ||
|
|
||
| - name: Launch EC2 instance | ||
| id: launch | ||
| run: | | ||
| COMMON_TAGS="{Key=$TEST_TAG_KEY,Value=$TEST_TAG_VAL}, | ||
| {Key=pr,Value=${{ steps.target.outputs.pr_tag_value }}}, | ||
| {Key=ref,Value=${{ github.ref_name }}}, | ||
| {Key=sha,Value=${{ github.sha }}}, | ||
| {Key=run-id,Value=${{ github.run_id }}}" | ||
| RUN_INSTANCES_JSON=$(aws ec2 run-instances \ | ||
|
cjonas9 marked this conversation as resolved.
|
||
| --image-id "${{ steps.ami.outputs.ami }}" \ | ||
| --instance-type "$INSTANCE_TYPE" \ | ||
| --iam-instance-profile "Name=$INSTANCE_PROFILE" \ | ||
| --user-data file:///tmp/user-data.sh \ | ||
| --instance-initiated-shutdown-behavior terminate \ | ||
| --block-device-mappings "[{ | ||
| \"DeviceName\":\"/dev/sda1\", | ||
| \"Ebs\":{\"VolumeSize\":$ROOT_VOLUME_GB,\"VolumeType\":\"gp3\",\"Iops\":$BOOTSTRAP_VOLUME_IOPS,\"Throughput\":$BOOTSTRAP_VOLUME_THROUGHPUT,\"DeleteOnTermination\":true} | ||
| }]" \ | ||
| --tag-specifications \ | ||
| "ResourceType=instance,Tags=[ | ||
| {Key=Name,Value=load-test-${{ steps.target.outputs.run_label }}}, | ||
| $COMMON_TAGS | ||
| ]" \ | ||
| "ResourceType=volume,Tags=[ | ||
| {Key=Name,Value=load-test-${{ steps.target.outputs.run_label }}-root}, | ||
| $COMMON_TAGS | ||
| ]" \ | ||
| --count 1 \ | ||
| --output json) | ||
|
|
||
| INSTANCE_ID=$(printf '%s' "$RUN_INSTANCES_JSON" | jq -r '.Instances[0].InstanceId') | ||
| echo "instance_id=$INSTANCE_ID" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Acknowledge launch in PR | ||
| if: steps.target.outputs.pr_number != '' | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| if ! gh pr comment ${{ steps.target.outputs.pr_number }} \ | ||
| --repo ${{ github.repository }} \ | ||
| --body "⏳ Load test launching on \`${{ steps.launch.outputs.instance_id }}\` (commit \`${{ github.sha }}\`). | ||
| Workflow run: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} | ||
| Posting results when the run finishes."; then | ||
| echo "::warning::Failed to post launch comment to PR #${{ steps.target.outputs.pr_number }}" | ||
| fi | ||
|
|
||
| - name: Wait for SSM agent to register | ||
| env: | ||
| INSTANCE_ID: ${{ steps.launch.outputs.instance_id }} | ||
| run: | | ||
| DEADLINE=$(( $(date +%s) + SSM_REGISTRATION_TIMEOUT )) | ||
| while [ $(date +%s) -lt $DEADLINE ]; do | ||
| PING=$(aws ssm describe-instance-information \ | ||
| --filters "Key=InstanceIds,Values=$INSTANCE_ID" \ | ||
| --query 'InstanceInformationList[0].PingStatus' \ | ||
| --output text 2>/dev/null || echo "") | ||
| echo "[$(date -u +%FT%TZ)] ssm ping=$PING" | ||
| if [ "$PING" = "Online" ]; then | ||
| exit 0 | ||
| fi | ||
| sleep 10 | ||
| done | ||
| echo "::error::SSM agent never registered for $INSTANCE_ID — verify AmazonSSMManagedInstanceCore is attached to the stellar-rpc-ci-load-test role" | ||
| exit 1 | ||
|
|
||
| - name: Poll for results | ||
| id: results | ||
| env: | ||
| INSTANCE_ID: ${{ steps.launch.outputs.instance_id }} | ||
| BUCKET: stellar-rpc-ci-load-test | ||
| RESULT_KEY: runs/${{ github.run_id }}-${{ github.run_attempt }}/result.json | ||
| run: go run "./$LOAD_TEST_DIR/runner" orchestrate | ||
|
|
||
| - name: Write results summary | ||
| if: always() | ||
| run: | | ||
| if [ -f /tmp/results.md ]; then | ||
| cat /tmp/results.md >> "$GITHUB_STEP_SUMMARY" | ||
| elif [ -f /tmp/timeout-comment.md ]; then | ||
| cat /tmp/timeout-comment.md >> "$GITHUB_STEP_SUMMARY" | ||
| fi | ||
|
|
||
| - name: Post results to PR | ||
| if: steps.target.outputs.pr_number != '' | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| if [ "${{ steps.results.outputs.found }}" = "true" ]; then | ||
| BODY=/tmp/results.md | ||
| else | ||
| BODY=/tmp/timeout-comment.md | ||
| fi | ||
| if [ ! -s "$BODY" ]; then | ||
| echo "::warning::No body to post to PR #${{ steps.target.outputs.pr_number }} ($BODY missing or empty)" | ||
| exit 0 | ||
| fi | ||
| if ! gh pr comment ${{ steps.target.outputs.pr_number }} \ | ||
| --repo ${{ github.repository }} \ | ||
| --body-file "$BODY"; then | ||
| echo "::warning::Failed to post comment to PR #${{ steps.target.outputs.pr_number }}" | ||
| fi | ||
|
|
||
| - name: Fail workflow on timeout or load-test failure | ||
| if: always() | ||
| run: | | ||
| if [ "${{ steps.results.outputs.found }}" != "true" ]; then | ||
| echo "Load test timed out before producing instance results" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ "${{ steps.results.outputs.passed }}" != "true" ]; then | ||
| echo "Instance reported a failing verdict" | ||
| cat /tmp/results.md 2>/dev/null || true | ||
| exit 1 | ||
| fi | ||
|
|
||
| - name: Terminate instance | ||
| if: always() && steps.launch.outputs.instance_id != '' | ||
| run: | | ||
| aws ec2 terminate-instances \ | ||
| --instance-ids ${{ steps.launch.outputs.instance_id }} || true | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.