CI: Replace nightly image build with S3 aiter wheel installation#303
CI: Replace nightly image build with S3 aiter wheel installation#303gyohuangxin wants to merge 28 commits intomainfrom
Conversation
- Remove build_atom_image job that pulled nightly image and rebuilt - Use rocm/pytorch:latest as base image directly - Install latest amd-aiter wheel from S3 bucket at runtime - Install ATOM and dependencies inside the container - Remove fork-specific Dockerfile build logic and Docker Login step
There was a problem hiding this comment.
Pull request overview
This PR updates the ATOM CI workflow to stop building/pushing a custom nightly Docker image and instead run tests from a base ROCm PyTorch image while installing amd-aiter, ATOM, and Python dependencies at container runtime.
Changes:
- Removes the
build_atom_imagejob and related fork/non-fork image handling. - Switches the test container base image to
rocm/pytorch:latest. - Adds steps to download/install the latest
amd-aiterwheel from S3 and to install ATOM + dependencies inside the running container.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| pip install awscli 2>/dev/null || true | ||
|
|
There was a problem hiding this comment.
pip install awscli 2>/dev/null || true can leave the container without the aws CLI (e.g., transient pip failure) and then the next aws s3 ... will fail with a less actionable command not found. Consider failing the step if awscli installation fails and/or add an explicit command -v aws check with a clear error message.
| pip install awscli 2>/dev/null || true | |
| if ! command -v aws >/dev/null 2>&1; then | |
| echo '=== Installing awscli ===' | |
| if ! pip install awscli; then | |
| echo 'ERROR: Failed to install awscli; cannot download amd-aiter wheel from S3' | |
| exit 1 | |
| fi | |
| fi | |
| if ! command -v aws >/dev/null 2>&1; then | |
| echo 'ERROR: aws CLI is not available after installation attempt; cannot download amd-aiter wheel from S3' | |
| exit 1 | |
| fi |
.github/workflows/atom-test.yaml
Outdated
| echo '=== Finding latest amd-aiter wheel from S3 ===' | ||
| LATEST_WHL=\$(aws s3 ls ${{ env.AITER_S3_BUCKET }}/ --no-sign-request \ | ||
| | grep 'amd_aiter.*\.whl' \ | ||
| | sort -k1,2 \ | ||
| | tail -1 \ | ||
| | awk '{print \$4}') | ||
|
|
||
| if [ -z \"\$LATEST_WHL\" ]; then | ||
| echo 'ERROR: No amd-aiter wheel found in S3 bucket' | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo \"Latest wheel: \$LATEST_WHL\" | ||
| echo '=== Downloading wheel ===' | ||
| aws s3 cp ${{ env.AITER_S3_BUCKET }}/\$LATEST_WHL /tmp/\$LATEST_WHL --no-sign-request | ||
|
|
||
| echo '=== Uninstalling existing amd-aiter ===' | ||
| pip uninstall -y amd-aiter || true | ||
|
|
||
| echo '=== Installing amd-aiter from wheel ===' | ||
| pip install /tmp/\$LATEST_WHL |
There was a problem hiding this comment.
Selecting and installing the "latest" wheel from S3 makes CI runs non-reproducible and can introduce flakiness (a newer wheel could land between reruns of the same commit). Consider pinning the wheel (e.g., via a version/commit marker file in S3, an explicit env override, or a deterministic naming convention) and logging the resolved wheel’s checksum/version so failures can be reproduced.
| echo '=== Finding latest amd-aiter wheel from S3 ===' | |
| LATEST_WHL=\$(aws s3 ls ${{ env.AITER_S3_BUCKET }}/ --no-sign-request \ | |
| | grep 'amd_aiter.*\.whl' \ | |
| | sort -k1,2 \ | |
| | tail -1 \ | |
| | awk '{print \$4}') | |
| if [ -z \"\$LATEST_WHL\" ]; then | |
| echo 'ERROR: No amd-aiter wheel found in S3 bucket' | |
| exit 1 | |
| fi | |
| echo \"Latest wheel: \$LATEST_WHL\" | |
| echo '=== Downloading wheel ===' | |
| aws s3 cp ${{ env.AITER_S3_BUCKET }}/\$LATEST_WHL /tmp/\$LATEST_WHL --no-sign-request | |
| echo '=== Uninstalling existing amd-aiter ===' | |
| pip uninstall -y amd-aiter || true | |
| echo '=== Installing amd-aiter from wheel ===' | |
| pip install /tmp/\$LATEST_WHL | |
| if [ -n \"\${AITER_WHEEL_NAME:-}\" ]; then | |
| echo '=== Using pinned amd-aiter wheel from AITER_WHEEL_NAME ===' | |
| SELECTED_WHL=\"\$AITER_WHEEL_NAME\" | |
| else | |
| echo '=== Finding latest amd-aiter wheel from S3 ===' | |
| SELECTED_WHL=\$(aws s3 ls ${{ env.AITER_S3_BUCKET }}/ --no-sign-request \ | |
| | grep 'amd_aiter.*\.whl' \ | |
| | sort -k1,2 \ | |
| | tail -1 \ | |
| | awk '{print \$4}') | |
| fi | |
| if [ -z \"\$SELECTED_WHL\" ]; then | |
| echo 'ERROR: No amd-aiter wheel found in S3 bucket' | |
| exit 1 | |
| fi | |
| echo \"Selected wheel: \$SELECTED_WHL\" | |
| echo '=== Downloading wheel ===' | |
| aws s3 cp ${{ env.AITER_S3_BUCKET }}/\$SELECTED_WHL /tmp/\$SELECTED_WHL --no-sign-request | |
| echo '=== Wheel SHA256 checksum ===' | |
| sha256sum /tmp/\$SELECTED_WHL || echo 'WARNING: sha256sum command failed' | |
| echo '=== Uninstalling existing amd-aiter ===' | |
| pip uninstall -y amd-aiter || true | |
| echo '=== Installing amd-aiter from wheel ===' | |
| pip install /tmp/\$SELECTED_WHL |
.github/workflows/atom-test.yaml
Outdated
| echo \"Latest wheel: \$LATEST_WHL\" | ||
| echo '=== Downloading wheel ===' | ||
| aws s3 cp ${{ env.AITER_S3_BUCKET }}/\$LATEST_WHL /tmp/\$LATEST_WHL --no-sign-request | ||
|
|
||
| echo '=== Uninstalling existing amd-aiter ===' | ||
| pip uninstall -y amd-aiter || true | ||
|
|
||
| echo '=== Installing amd-aiter from wheel ===' | ||
| pip install /tmp/\$LATEST_WHL | ||
|
|
There was a problem hiding this comment.
The workflow downloads and installs a wheel from a public S3 location with --no-sign-request but does not perform any integrity verification before pip install. To reduce supply-chain risk, consider fetching a corresponding checksum/signature (e.g., .sha256) and verifying it before installation, or using a signed URL / authenticated access.
…runner - Add aws-actions/configure-aws-credentials step with IAM role - Download aiter wheel on the runner (not inside container) - Copy wheel into container via docker cp for installation
- Remove AWS credentials and S3 bucket configuration - Download latest aiter wheel from ROCm/aiter CI artifacts via GitHub API - Use GITHUB_TOKEN for cross-repo artifact access
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| - name: Build Docker image for forked repo | ||
| if: (matrix.run_on_pr == true || github.event_name != 'pull_request') && github.event.pull_request.head.repo.fork | ||
| run: | | ||
| docker build --pull --network=host \ | ||
| --no-cache \ | ||
| -t atom_test:ci \ | ||
| -f Dockerfile.mod . | ||
| ARTIFACT_JSON=$(gh api "repos/ROCm/aiter/actions/artifacts?per_page=100" \ | ||
| --jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | sort_by(.created_at) | last') |
There was a problem hiding this comment.
The workflow uses ${{ secrets.GITHUB_TOKEN }} (scoped to this repo) to call repos/ROCm/aiter/actions/artifacts/.... GITHUB_TOKEN cannot access private resources (including Actions artifacts) in a different repository, so this step will fail on push/schedule and on PRs (especially forks). Consider switching to the S3 wheel source described in the PR, or minting a token with actions:read on ROCm/aiter (GitHub App/PAT) and adding a fork-safe fallback/skip path.
.github/workflows/atom-test.yaml
Outdated
| - name: Download latest aiter wheel from CI artifacts | ||
| if: matrix.run_on_pr == true || github.event_name != 'pull_request' | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" |
There was a problem hiding this comment.
PR description says the latest amd-aiter wheel is installed from S3 (s3://framework-whls-nightlies/...), but the workflow now downloads a GitHub Actions artifact from ROCm/aiter. If S3 is the intended source of truth, this step should be updated to match (or the PR description updated) to avoid confusion about provenance and required credentials.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | sort_by(.created_at) | last') | ||
|
|
||
| ARTIFACT_NAME=$(echo "$ARTIFACT_JSON" | jq -r '.name') | ||
| ARTIFACT_ID=$(echo "$ARTIFACT_JSON" | jq -r '.id') |
There was a problem hiding this comment.
The workflow selects the most recently created aiter-whl-main artifact and installs it without any version/commit pinning or integrity verification. This makes CI non-reproducible and increases supply-chain risk if an unexpected artifact is published. Consider pinning to a specific aiter commit/SHA (or a versioned wheel path), and/or verifying a checksum/signature before installing.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| # Search Aiter Test workflow runs on main branch for one that has an aiter-whl artifact | ||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
There was a problem hiding this comment.
This uses the workflow run’s GITHUB_TOKEN to download artifacts from a different repository (ROCm/aiter). The token is scoped to the current repo and typically cannot access cross-repo Actions artifacts, which will commonly fail with 403/404. Use a dedicated secret (PAT or GitHub App token) that has actions:read on ROCm/aiter, or switch to the PR-described S3 wheel source to avoid cross-repo artifact auth entirely.
.github/workflows/atom-test.yaml
Outdated
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') |
There was a problem hiding this comment.
This uses the workflow run’s GITHUB_TOKEN to download artifacts from a different repository (ROCm/aiter). The token is scoped to the current repo and typically cannot access cross-repo Actions artifacts, which will commonly fail with 403/404. Use a dedicated secret (PAT or GitHub App token) that has actions:read on ROCm/aiter, or switch to the PR-described S3 wheel source to avoid cross-repo artifact auth entirely.
.github/workflows/atom-test.yaml
Outdated
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do |
There was a problem hiding this comment.
The script selects the first run (most recent by API ordering) that contains a non-expired artifact, but does not require the run to be status=completed and conclusion=success. This can pick artifacts from failed or in-progress runs and lead to installing a broken wheel. Filter workflow runs to completed+successful before searching artifacts (or validate run conclusion before accepting the artifact).
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | |
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[] | select(.status=="completed" and .conclusion=="success") | .id'); do |
.github/workflows/atom-test.yaml
Outdated
| echo "Found artifact in run $RUN_ID: $ARTIFACT_NAME (ID: $ARTIFACT_ID)" | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
The script selects the first run (most recent by API ordering) that contains a non-expired artifact, but does not require the run to be status=completed and conclusion=success. This can pick artifacts from failed or in-progress runs and lead to installing a broken wheel. Filter workflow runs to completed+successful before searching artifacts (or validate run conclusion before accepting the artifact).
.github/workflows/atom-test.yaml
Outdated
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do |
There was a problem hiding this comment.
This step assumes jq and unzip are present on the runner host. Previously, jq was installed inside the Docker image, but now parsing/downloading happens before the container is started. Add an explicit dependency-install step for the runner (or rewrite parsing to avoid jq, e.g., using python -c), otherwise this will fail on self-hosted runners that don’t preinstall these tools.
.github/workflows/atom-test.yaml
Outdated
| curl -s -L -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/artifacts/$ARTIFACT_ID/zip" \ | ||
| -o /tmp/aiter-whl.zip | ||
| unzip -o /tmp/aiter-whl.zip -d /tmp/aiter-whl |
There was a problem hiding this comment.
This step assumes jq and unzip are present on the runner host. Previously, jq was installed inside the Docker image, but now parsing/downloading happens before the container is started. Add an explicit dependency-install step for the runner (or rewrite parsing to avoid jq, e.g., using python -c), otherwise this will fail on self-hosted runners that don’t preinstall these tools.
.github/workflows/atom-test.yaml
Outdated
| AITER_WHL=$(ls /tmp/aiter-whl/amd_aiter*.whl 2>/dev/null | head -1) | ||
| if [ -z "$AITER_WHL" ]; then | ||
| echo "ERROR: No amd_aiter wheel found in artifact" | ||
| ls -la /tmp/aiter-whl/ | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
If the artifact ever contains multiple matching wheels, head -1 is not guaranteed to pick the newest/desired build. Prefer selecting deterministically (e.g., version-sort then take the highest) or enforce that exactly one wheel is present and fail otherwise.
| AITER_WHL=$(ls /tmp/aiter-whl/amd_aiter*.whl 2>/dev/null | head -1) | |
| if [ -z "$AITER_WHL" ]; then | |
| echo "ERROR: No amd_aiter wheel found in artifact" | |
| ls -la /tmp/aiter-whl/ | |
| exit 1 | |
| fi | |
| AITER_WHL_CANDIDATES=$(ls -1 /tmp/aiter-whl/amd_aiter*.whl 2>/dev/null | sort -V || true) | |
| if [ -z "$AITER_WHL_CANDIDATES" ]; then | |
| echo "ERROR: No amd_aiter wheel found in artifact" | |
| ls -la /tmp/aiter-whl/ | |
| exit 1 | |
| fi | |
| AITER_WHL=$(echo "$AITER_WHL_CANDIDATES" | tail -n 1) |
.github/workflows/atom-test.yaml
Outdated
| fi | ||
|
|
||
| echo "Downloaded wheel: $AITER_WHL" | ||
| echo "AITER_WHL_PATH=$AITER_WHL" >> $GITHUB_ENV |
There was a problem hiding this comment.
$GITHUB_ENV should be quoted when redirecting to avoid issues with unexpected whitespace/shell expansion. Use a quoted redirect target (and consider using printf for robustness).
| echo "AITER_WHL_PATH=$AITER_WHL" >> $GITHUB_ENV | |
| echo "AITER_WHL_PATH=$AITER_WHL" >> "$GITHUB_ENV" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| WHL_NAME=$(basename "${{ env.AITER_WHL_PATH }}") | ||
| docker cp "${{ env.AITER_WHL_PATH }}" "$CONTAINER_NAME:/tmp/$WHL_NAME" |
There was a problem hiding this comment.
AITER_WHL_PATH is set via $GITHUB_ENV, but later referenced using the ${{ env.AITER_WHL_PATH }} expression context. Values written to $GITHUB_ENV are available as shell environment variables in subsequent steps (e.g., $AITER_WHL_PATH), but are not reliably available through the env expression context, which can make basename/docker cp run with an empty path. Use the runtime shell env var (or step outputs) instead of ${{ env.* }} here.
| WHL_NAME=$(basename "${{ env.AITER_WHL_PATH }}") | |
| docker cp "${{ env.AITER_WHL_PATH }}" "$CONTAINER_NAME:/tmp/$WHL_NAME" | |
| WHL_NAME=$(basename "$AITER_WHL_PATH") | |
| docker cp "$AITER_WHL_PATH" "$CONTAINER_NAME:/tmp/$WHL_NAME" |
.github/workflows/atom-test.yaml
Outdated
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| # Search Aiter Test workflow runs on main branch for one that has an aiter-whl artifact | ||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | ||
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') | ||
|
|
||
| if [ "$ARTIFACT_JSON" != "null" ] && [ -n "$ARTIFACT_JSON" ]; then | ||
| ARTIFACT_ID=$(echo "$ARTIFACT_JSON" | jq -r '.id') | ||
| ARTIFACT_NAME=$(echo "$ARTIFACT_JSON" | jq -r '.name') | ||
| echo "Found artifact in run $RUN_ID: $ARTIFACT_NAME (ID: $ARTIFACT_ID)" | ||
| break | ||
| fi | ||
| done | ||
|
|
||
| - name: Build Docker image for forked repo | ||
| if: (matrix.run_on_pr == true || github.event_name != 'pull_request') && github.event.pull_request.head.repo.fork | ||
| run: | | ||
| docker build --pull --network=host \ | ||
| --no-cache \ | ||
| -t atom_test:ci \ | ||
| -f Dockerfile.mod . | ||
| if [ -z "$ARTIFACT_ID" ] || [ "$ARTIFACT_ID" = "null" ]; then | ||
| echo "ERROR: No aiter-whl-main artifact found in recent Aiter Test runs" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "=== Downloading artifact ===" | ||
| mkdir -p /tmp/aiter-whl | ||
| curl -s -L -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/artifacts/$ARTIFACT_ID/zip" \ | ||
| -o /tmp/aiter-whl.zip | ||
| unzip -o /tmp/aiter-whl.zip -d /tmp/aiter-whl |
There was a problem hiding this comment.
This step attempts to download a workflow artifact from the ROCm/aiter repository using ${{ secrets.GITHUB_TOKEN }}. The workflow token is scoped to the current repository and typically cannot access private/collaborator-only resources (including Actions artifacts) in other repositories; this is likely to fail with 403s, especially on PRs from forks where secrets/PATs aren’t available. Consider switching to the PR-described S3 wheel source (or another publicly readable location), or use a dedicated PAT with explicit access and a fork-safe fallback behavior.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/atom-test.yaml
Outdated
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do |
.github/workflows/atom-test.yaml
Outdated
| - name: Download latest aiter wheel from CI artifacts | ||
| if: matrix.run_on_pr == true || github.event_name != 'pull_request' | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
|
|
||
| env: | ||
| ATOM_BASE_NIGTHLY_IMAGE: rocm/atom-dev:latest | ||
| ATOM_BASE_IMAGE: rocm/pytorch:latest |
| atom: | ||
| needs: [pre-checks, build_atom_image] | ||
| needs: [pre-checks] | ||
| name: ATOM Test |
.github/workflows/atom-test.yaml
Outdated
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" |
.github/workflows/atom-test.yaml
Outdated
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
The build_atom_image job is no longer needed since we now install aiter from CI artifact wheels and ATOM/dependencies at runtime inside the container.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do |
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
| - name: Find and download latest aiter wheel | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN wget https://github.com/stedolan/jq/releases/download/jq-1.7/jq-linux64 -O jq | ||
| RUN chmod +x jq | ||
| RUN mv jq /usr/local/bin/jq | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone --depth 1 https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" |
| GITHUB_REPO_URL: ${{ github.event.pull_request.head.repo.clone_url || 'https://github.com/ROCm/ATOM.git' }} | ||
| GITHUB_COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.event.head_commit.id }} |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| docker run -dt --device=/dev/kfd $DEVICE_FLAG \ | ||
| docker run -dt --pull always --device=/dev/kfd $DEVICE_FLAG \ |
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
| - name: Find and download latest aiter wheel | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN wget https://github.com/stedolan/jq/releases/download/jq-1.7/jq-linux64 -O jq | ||
| RUN chmod +x jq | ||
| RUN mv jq /usr/local/bin/jq | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone --depth 1 https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
| atom: | ||
| needs: [pre-checks, build_atom_image] | ||
| needs: [pre-checks, download_aiter_wheel] | ||
| name: ATOM Test | ||
| strategy: |
|
|
||
| env: | ||
| ATOM_BASE_NIGTHLY_IMAGE: rocm/atom-dev:latest | ||
| ATOM_BASE_IMAGE: rocm/pytorch:latest |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| docker run -dt --device=/dev/kfd $DEVICE_FLAG \ | ||
| docker run -dt --pull always --device=/dev/kfd $DEVICE_FLAG \ |
| GITHUB_REPO_URL: ${{ github.event.pull_request.head.repo.clone_url || 'https://github.com/ROCm/ATOM.git' }} | ||
| GITHUB_COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.event.head_commit.id }} |
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
| - name: Find and download latest aiter wheel | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN wget https://github.com/stedolan/jq/releases/download/jq-1.7/jq-linux64 -O jq | ||
| RUN chmod +x jq | ||
| RUN mv jq /usr/local/bin/jq | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone --depth 1 https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 |
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
|
|
||
| env: | ||
| ATOM_BASE_NIGTHLY_IMAGE: rocm/atom-dev:latest | ||
| ATOM_BASE_IMAGE: rocm/pytorch:latest |
a5eaede to
a366f00
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
.github/workflows/atom-test.yaml:273
docker runmounts/workspacetwice (both${GITHUB_WORKSPACE:-$PWD}and${{ github.workspace }}) and also sets-w /workspacetwice. This duplication is easy to miss when editing the command and can lead to confusion about which path is authoritative; consider removing the duplicate-v/-wentries.
docker run -dt --pull always --device=/dev/kfd $DEVICE_FLAG \
-v "${GITHUB_WORKSPACE:-$PWD}":/workspace \
$MODEL_MOUNT \
-w /workspace \
--ipc=host --group-add video \
--shm-size=16G \
--privileged \
--cap-add=SYS_PTRACE \
-e HF_TOKEN="${HF_TOKEN:-}" \
--env-file /tmp/env_file.txt \
--security-opt seccomp=unconfined \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-e ATOM_DISABLE_MMAP=true \
-v "${{ github.workspace }}:/workspace" \
-w /workspace \
--name "$CONTAINER_NAME" \
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Find and download latest aiter wheel | ||
| run: | | ||
| cat <<EOF > Dockerfile.mod | ||
| FROM ${{ env.ATOM_BASE_NIGTHLY_IMAGE }} | ||
| RUN pip install -U lm-eval[api] | ||
| RUN pip show lm-eval || true | ||
| RUN pip install hf_transfer | ||
| RUN pip show hf_transfer || true | ||
| RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true | ||
| RUN pip uninstall -y amd-aiter | ||
| RUN pip install --upgrade "pybind11>=3.0.1" | ||
| RUN pip show pybind11 | ||
| RUN wget https://github.com/stedolan/jq/releases/download/jq-1.7/jq-linux64 -O jq | ||
| RUN chmod +x jq | ||
| RUN mv jq /usr/local/bin/jq | ||
| RUN rm -rf /app/aiter-test | ||
| RUN git clone --depth 1 https://github.com/ROCm/aiter.git /app/aiter-test && \\ | ||
| cd /app/aiter-test && \\ | ||
| git checkout HEAD && \\ | ||
| git submodule sync && git submodule update --init --recursive && \\ | ||
| MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop | ||
| RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true | ||
|
|
||
| RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true | ||
| RUN pip uninstall -y atom | ||
| RUN rm -rf /app/ATOM | ||
| RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\ | ||
| cd /app/ATOM && \\ | ||
| git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\ | ||
| pip install -e . | ||
|
|
||
| RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true | ||
| EOF | ||
| set -euo pipefail | ||
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | ||
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') | ||
|
|
||
| if [ "$ARTIFACT_JSON" != "null" ] && [ -n "$ARTIFACT_JSON" ]; then | ||
| ARTIFACT_ID=$(echo "$ARTIFACT_JSON" | jq -r '.id') | ||
| ARTIFACT_NAME=$(echo "$ARTIFACT_JSON" | jq -r '.name') | ||
| echo "Found artifact in run $RUN_ID: $ARTIFACT_NAME (ID: $ARTIFACT_ID)" | ||
| break | ||
| fi | ||
| done | ||
|
|
||
| - name: Build Docker image | ||
| if: ${{ !github.event.pull_request.head.repo.fork }} | ||
| run: | | ||
| docker build --pull --network=host \ | ||
| --no-cache \ | ||
| -t atom_test:ci \ | ||
| -f Dockerfile.mod . | ||
| if [ -z "$ARTIFACT_ID" ] || [ "$ARTIFACT_ID" = "null" ]; then | ||
| echo "ERROR: No aiter-whl-main artifact found in recent Aiter Test runs" | ||
| exit 1 | ||
| fi | ||
|
|
||
| - name: Push Docker image | ||
| if: ${{ !github.event.pull_request.head.repo.fork }} | ||
| run: | | ||
| IMAGE_TAG=rocm/atom-dev:pre-build-${{ env.GITHUB_COMMIT_SHA }} | ||
| docker tag atom_test:ci $IMAGE_TAG | ||
| echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin | ||
| docker push $IMAGE_TAG | ||
| echo "=== Downloading artifact ===" | ||
| mkdir -p aiter-whl | ||
| curl -s -L -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/artifacts/$ARTIFACT_ID/zip" \ | ||
| -o aiter-whl.zip |
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | ||
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') | ||
|
|
||
| if [ "$ARTIFACT_JSON" != "null" ] && [ -n "$ARTIFACT_JSON" ]; then | ||
| ARTIFACT_ID=$(echo "$ARTIFACT_JSON" | jq -r '.id') | ||
| ARTIFACT_NAME=$(echo "$ARTIFACT_JSON" | jq -r '.name') | ||
| echo "Found artifact in run $RUN_ID: $ARTIFACT_NAME (ID: $ARTIFACT_ID)" | ||
| break | ||
| fi | ||
| done |
| GITHUB_REPO_URL: ${{ github.event.pull_request.head.repo.clone_url || 'https://github.com/ROCm/ATOM.git' }} | ||
| GITHUB_COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.event.head_commit.id }} |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | ||
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') | ||
|
|
|
|
||
| env: | ||
| ATOM_BASE_NIGHTLY_IMAGE: rocm/atom-dev:latest | ||
| ATOM_BASE_IMAGE: rocm/pytorch:latest |
|
|
||
|
|
||
| docker run -dt --device=/dev/kfd $DEVICE_FLAG \ | ||
| docker run -dt --pull always --device=/dev/kfd $DEVICE_FLAG \ |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| download_aiter_wheel: | ||
| if: ${{ needs.check-signal.result == 'success' && (!github.event.pull_request || github.event.pull_request.draft == false) }} | ||
| needs: [check-signal] | ||
| name: Build ATOM image | ||
| runs-on: build-only-atom | ||
| name: Download aiter wheel |
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") | ||
|
|
||
| ARTIFACT_ID="" | ||
| ARTIFACT_NAME="" | ||
| for RUN_ID in $(echo "$RUNS" | jq -r '.workflow_runs[].id'); do | ||
| ARTIFACT_JSON=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/runs/$RUN_ID/artifacts" \ | ||
| | jq '[.artifacts[] | select(.name | startswith("aiter-whl-main")) | select(.expired == false)] | first') |
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
|
|
||
| env: | ||
| ATOM_BASE_NIGHTLY_IMAGE: rocm/atom-dev:latest | ||
| ATOM_BASE_IMAGE: rocm/pytorch:latest |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "=== Finding latest aiter-whl-main artifact from ROCm/aiter ===" | ||
|
|
||
| API_URL="https://api.github.com" | ||
| AUTH_HEADER="Authorization: token ${{ secrets.GITHUB_TOKEN }}" | ||
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
| AITER_TEST_WORKFLOW_ID=179476100 | ||
|
|
||
| RUNS=$(curl -s -H "$AUTH_HEADER" \ | ||
| "$API_URL/repos/ROCm/aiter/actions/workflows/$AITER_TEST_WORKFLOW_ID/runs?per_page=100&branch=main&event=push") |
| with: | ||
| name: aiter-whl | ||
| path: /tmp/aiter-whl | ||
|
|
|
|
||
|
|
||
| docker run -dt --device=/dev/kfd $DEVICE_FLAG \ | ||
| docker run -dt --pull always --device=/dev/kfd $DEVICE_FLAG \ |
| echo '=== Installing amd-aiter from wheel ===' | ||
| pip install /tmp/$WHL_NAME | ||
|
|
| name: Build ATOM image | ||
| runs-on: build-only-atom | ||
| name: Download aiter wheel | ||
| runs-on: ubuntu-latest |
|
|
||
| - name: Upload aiter wheel | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: aiter-whl | ||
| path: aiter-whl/amd_aiter*.whl |
| AITER_WHL=$(ls -t aiter-whl/amd_aiter*.whl 2>/dev/null | head -1) | ||
| if [ -z "$AITER_WHL" ]; then | ||
| echo "ERROR: No amd_aiter wheel found in artifact" | ||
| ls -la aiter-whl/ | ||
| exit 1 |
| AITER_WHL=$(ls -t /tmp/aiter-whl/amd_aiter*.whl 2>/dev/null | head -1) | ||
| if [ -z "$AITER_WHL" ]; then | ||
| echo "ERROR: No amd_aiter wheel found" | ||
| ls -la /tmp/aiter-whl/ | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "=== Copying wheel into container ===" | ||
| WHL_NAME=$(basename "$AITER_WHL") | ||
| docker cp "$AITER_WHL" "$CONTAINER_NAME:/tmp/$WHL_NAME" | ||
|
|
||
| docker exec "$CONTAINER_NAME" bash -lc " | ||
| set -euo pipefail | ||
| echo '=== Uninstalling existing amd-aiter ===' | ||
| pip uninstall -y amd-aiter || true | ||
|
|
||
| echo '=== Installing amd-aiter from wheel ===' | ||
| pip install /tmp/$WHL_NAME |
Summary
build_atom_imagejob to eliminate nightly image pull and rebuild overheadrocm/pytorch:latestas the base image instead ofrocm/atom-dev:latestamd-aiterwheel from S3 (s3://framework-whls-nightlies/whl-staging/gfx942-gfx950/) at runtime instead of building from sourceTest plan