From 3c2d12497019dfa35edc7b2aad35affab6c686c6 Mon Sep 17 00:00:00 2001 From: ppippi Date: Mon, 11 Aug 2025 01:10:42 +0900 Subject: [PATCH 1/5] =?UTF-8?q?=EB=B2=88=EC=97=AD=20workflow=20=EC=9E=90?= =?UTF-8?q?=EB=8F=99=ED=99=94?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- _posts/2025-07-03-actions-runner-controller.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2025-07-03-actions-runner-controller.md b/_posts/2025-07-03-actions-runner-controller.md index 6dec20a..d7a918f 100644 --- a/_posts/2025-07-03-actions-runner-controller.md +++ b/_posts/2025-07-03-actions-runner-controller.md @@ -285,3 +285,4 @@ docker run -it \ ARC를 사용하면 GitHub에서 제공하는 Runner를 사용할 때의 비싼 비용 문제와, 직접 VM을 관리하며 Runner를 운영할 때의 비효율성을 모두 해결할 수 있습니다. 특히 GPU가 필요하거나, 복잡한 의존성을 가진 MLOps CI/CD 환경을 구축할 때 ARC는 매우 강력한 도구가 됩니다. 초기 설정 과정이 다소 복잡하게 느껴질 수 있지만, 한번 구축해두면 CI/CD 비용을 크게 절감하고 운영 부담을 덜어주므로 MLOps를 고민하고 있다면 꼭 한번 도입을 검토해보시길 바랍니다. + From ccae34fa327bea906e045d660cc8844d2972fee4 Mon Sep 17 00:00:00 2001 From: ppippi-dev <61408680+ppippi-dev@users.noreply.github.com> Date: Sun, 10 Aug 2025 16:12:36 +0000 Subject: [PATCH 2/5] chore: add English translations for PR #6 --- .../2025-07-03-actions-runner-controller.md | 278 ++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 _posts_en/2025-07-03-actions-runner-controller.md diff --git a/_posts_en/2025-07-03-actions-runner-controller.md b/_posts_en/2025-07-03-actions-runner-controller.md new file mode 100644 index 0000000..09cc27d --- /dev/null +++ b/_posts_en/2025-07-03-actions-runner-controller.md @@ -0,0 +1,278 @@ +--- +feature-img: assets/img/2025-07-03/0.png +layout: post +subtitle: Building an MLOps CI Environment +tags: +- MLOps +- Infra +title: Setting Up Actions Runner Controller +--- + + +### Intro + +As I’ve been enjoying building with AI lately, I’ve felt even more strongly how important a solid test environment is. + +The most common approach is to set up CI with GitHub Actions, but in MLOps, CI often requires high-spec instances. + +GitHub Actions does offer a [GPU instance (Linux 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at $0.07 per minute as of now, it’s quite expensive to use. + +It’s also limited to an Nvidia T4 GPU, which can be restrictive as model sizes continue to grow. + +As an alternative, you can use a self-hosted runner. + +As the name suggests, you set up the runner yourself and execute GitHub workflows on it. + +You can configure this using GitHub’s [Add a self-hosted runner](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners). + +However, this approach requires keeping your CI machine always on (online), which can be inefficient if CI/CD jobs are infrequent. + +That’s where the Actions Runner Controller (ARC) becomes an excellent alternative. + +[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open source project that lets you run GitHub Actions runners in a Kubernetes environment. + +With it, you can run CI using your Kubernetes resources only when a GitHub Actions workflow is triggered. + + +### Installing Actions Runner Controller + +The ARC installation consists of two major steps. +1. Create a GitHub Personal Access Token for communication and authentication with GitHub +2. Install ARC with Helm and authenticate using the token + +#### 1. Create a GitHub Personal Access Token + +ARC needs authentication to interact with the GitHub API to register and manage runners. Create a GitHub Personal Access Token (PAT). + +* Path: `Settings` > `Developer settings` > `Personal access tokens` > `Tokens (classic)` > `Generate new token` + +When creating the PAT, select the [appropriate scopes](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.) + +> For security, use least privilege and set an expiration date. + +It’s generally recommended to authenticate via a GitHub App rather than PAT. + +Keep the PAT safe—you’ll need it in the next step when installing ARC. + +#### 2. Install ARC with Helm + +ARC requires cert-manager. If cert-manager isn’t set up in your cluster, install it: + +```bash +kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml +``` + +Now use Helm to install ARC into your Kubernetes cluster. + + + + + +Install ARC using the Personal Access Token you created earlier. Replace `YOUR_GITHUB_TOKEN` with your PAT in the command below. + +```bash +helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller + +helm repo update + +helm pull actions-runner-controller/actions-runner-controller + +tar -zxvf actions-runner-controller-*.tgz + +export GITHUB_TOKEN=YOUR_GITHUB_TOKEN + +helm upgrade --install actions-runner-controller ./actions-runner-controller \ + --namespace actions-runner-system \ + --create-namespace \ + --set authSecret.create=true \ + --set authSecret.github_token="${GITHUB_TOKEN}" +``` + +After installation, verify the ARC controller is running: + +```bash +kubectl get pods -n actions-runner-system +``` + +If this succeeds, you’ll see the ARC controller manager pod running in the `actions-runner-system` namespace. + +ARC is now ready to talk to GitHub! Next, define the runner that will actually execute your workflows. + +### 3. Configure the Runner + +We’ve installed the ARC controller, but there’s no runner yet to execute workflows. We need to create runner pods based on GitHub Actions jobs. + +We’ll use two resources: +1. RunnerDeployment: Acts as the template for runner pods—defines which container image to use, which GitHub repository to connect to, which labels to apply, etc. +2. HorizontalRunnerAutoscaler (HRA): Watches the RunnerDeployment and automatically adjusts its replicas based on the number of queued jobs in GitHub. + +#### Define RunnerDeployment + +Create a file named `runner-deployment.yml` as below. Change `spec.template.spec.repository` to your GitHub repository. + +> If you have permissions, you can target an organization instead of a single repository. + +```yaml +apiVersion: actions.summerwind.dev/v1alpha1 +kind: RunnerDeployment +metadata: + name: example-runner-deployment + namespace: actions-runner-system +spec: + replicas: 1 + template: + spec: + repository: / + labels: + - self-hosted + - arc-runner +``` + +With this configured, you’ll see the self-hosted runner in your GitHub repo’s Actions. + + + +After it’s deployed, in a moment you’ll find a new runner registered under your repository’s Settings > Actions > Runners tab with the labels `self-hosted` and `arc-runner`. + + +#### Define HorizontalRunnerAutoscaler + +Next, define an HRA to auto-scale the RunnerDeployment. Create a `hra.yml` file. + +```yaml +apiVersion: actions.summerwind.dev/v1alpha1 +kind: HorizontalRunnerAutoscaler +metadata: + name: example-hra + namespace: actions-runner-system +spec: + scaleTargetRef: + name: example-runner-deployment + minReplicas: 0 + maxReplicas: 5 +``` + +Set minReplicas and maxReplicas to scale up and down based on resources. + +You can also specify additional metrics to create pods whenever a workflow is triggered. Many other metrics are available. + +> When you configure a HorizontalRunnerAutoscaler, runners are created only when needed. When there are zero runners, you won’t see them in the GitHub UI. + + + +```yaml +apiVersion: actions.summerwind.dev/v1alpha1 +kind: HorizontalRunnerAutoscaler +metadata: + name: example-hra + namespace: actions-runner-system +spec: + scaleTargetRef: + name: example-runner-deployment + minReplicas: 0 + maxReplicas: 5 + metrics: + - type: TotalNumberOfQueuedAndInProgressWorkflowRuns + repositoryNames: ["/"] + +The above is my preferred metric—it scales up when workflow runs are needed (i.e., when jobs are queued). +As shown, you can choose metrics as needed to get great results. + + +### 4. Use it in a GitHub Actions workflow + +We’re all set! Using the new ARC runner is simple: in your workflow file, set the `runs-on` key to the labels specified in the RunnerDeployment. + +Add a simple test workflow (`test-arc.yml`) under your repository’s `.github/workflows/` directory: + +```yaml +name: ARC Runner Test + +on: + push: + branches: + - main + +jobs: + test-job: + runs-on: [self-hosted, arc-runner] + steps: + - name: Checkout code + uses: actions/checkout@v3 + + - name: Test + run: | + echo "Hello from an ARC runner!" + echo "This runner is running inside a Kubernetes pod." + sleep 10 +``` + +The key is `runs-on: [self-hosted, arc-runner]`. When this workflow runs, GitHub assigns the job to a runner with both `self-hosted` and `arc-runner` labels. ARC detects the event and, based on the HRA settings, creates a new runner pod if needed to process the job. + +> With self-hosted runners, unlike GitHub-hosted ones, you may need to install certain packages within the workflow. + +### Troubleshooting notes + +I often use Docker for CI/CD, and one recurring issue is DinD (Docker in Docker). + +With ARC, by default a runner container (scheduling container) and a docker daemon container run as sidecars. + +To handle this, there are Docker images that support DinD. + +In a YAML like the one below, specify the image and set dockerdWithinRunnerContainer to run the Docker daemon inside the runner, and the workflow will run on that runner. + +```yaml +apiVersion: actions.summerwind.dev/v1alpha1 +kind: RunnerDeployment +metadata: + name: example-runner-deployment + namespace: actions-runner-system +spec: + replicas: 1 + template: + spec: + repository: / + labels: + - self-hosted + - arc-runner + image: "summerwind/actions-runner-dind:latest" + dockerdWithinRunnerContainer: true +``` + +For Docker tests that require GPUs, if your cluster has NVIDIA Container Toolkit installed, using the DinD image above can make GPUs visible. + +If you configure your workflow as below, you can confirm GPUs are properly set up even in a DinD scenario. (Be sure to check your NVIDIA Container Toolkit and NVIDIA GPU Driver Plugin versions!) + +```bash +# GPU 디바이스 확인 +ls -la /dev/nvidia* + +# device library setup +smi_path=$(find / -name "nvidia-smi" 2>/dev/null | head -n 1) +lib_path=$(find / -name "libnvidia-ml.so" 2>/dev/null | head -n 1) +lib_dir=$(dirname "$lib_path") +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(dirname "$lib_path") +export NVIDIA_VISIBLE_DEVICES=all +export NVIDIA_DRIVER_CAPABILITIES=compute,utility + +# nvidia runtime 없이 직접 GPU 디바이스와 라이브러리 마운트 +docker run -it \ + --device=/dev/nvidia0:/dev/nvidia0 \ + --device=/dev/nvidiactl:/dev/nvidiactl \ + --device=/dev/nvidia-uvm:/dev/nvidia-uvm \ + --device=/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \ + -v "$lib_dir:$lib_dir:ro" \ + -v "$(dirname $smi_path):$(dirname $smi_path):ro" \ + -e LD_LIBRARY_PATH="$LD_LIBRARY_PATH" \ + -e NVIDIA_VISIBLE_DEVICES="$NVIDIA_VISIBLE_DEVICES" \ + -e NVIDIA_DRIVER_CAPABILITIES="$NVIDIA_DRIVER_CAPABILITIES" \ + pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime +``` + +### Wrapping up + +We’ve looked at how to build a dynamically scalable self-hosted runner environment by deploying Actions Runner Controller in Kubernetes. + +ARC helps you avoid the high cost of GitHub-hosted runners and the inefficiency of managing VMs for runners yourself. It’s especially powerful when building MLOps CI/CD environments that need GPUs or have complex dependencies. + +While the initial setup can feel a bit involved, once it’s in place it can significantly cut CI/CD costs and reduce operational burden. If you’re considering MLOps, it’s well worth evaluating. \ No newline at end of file From be831a0f67d7130504117d3d16740ac11c229cc6 Mon Sep 17 00:00:00 2001 From: ppippi Date: Mon, 11 Aug 2025 01:13:25 +0900 Subject: [PATCH 3/5] =?UTF-8?q?=EB=B2=88=EC=97=AD=20workflow=20=EC=9E=90?= =?UTF-8?q?=EB=8F=99=ED=99=94?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- scripts/translate_to_en.py | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/scripts/translate_to_en.py b/scripts/translate_to_en.py index 3ba31b4..90f9f1b 100644 --- a/scripts/translate_to_en.py +++ b/scripts/translate_to_en.py @@ -10,7 +10,7 @@ try: from openai import OpenAI -except Exception: +except ImportError: print("[ERROR] openai package not available. Make sure it's installed.") sys.exit(1) @@ -105,7 +105,7 @@ def split_front_matter(translated_markdown: str) -> tuple[dict, str]: fm = yaml.safe_load(fm_text) or {} if not isinstance(fm, dict): fm = {} - except Exception: + except yaml.YAMLError: fm = {} return fm, body @@ -154,10 +154,20 @@ def main() -> int: # Construct English filename by preserving the original filename en_path = to_en_filename(src) + + # If the English file already exists, remove it first to ensure a clean overwrite + if en_path.exists(): + try: + en_path.unlink() + print(f"[translate] Overwriting existing: {en_path.relative_to(REPO_ROOT)}") + except OSError: + # Best-effort unlink; continue with write which will overwrite contents + pass + en_post = frontmatter.Post(body, **fm) with open(en_path, "w", encoding="utf-8") as f: f.write(frontmatter.dumps(en_post)) - print(f"[translate] Created: {en_path.relative_to(REPO_ROOT)}") + print(f"[translate] Created/Updated: {en_path.relative_to(REPO_ROOT)}") created += 1 print(f"[translate] New English posts: {created}") From 9b23476c0a087fc3c6c3508426df7efc037d36e9 Mon Sep 17 00:00:00 2001 From: ppippi Date: Mon, 11 Aug 2025 01:13:52 +0900 Subject: [PATCH 4/5] =?UTF-8?q?=EB=B2=88=EC=97=AD=20workflow=20=EC=9E=90?= =?UTF-8?q?=EB=8F=99=ED=99=94?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- scripts/translate_to_en.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/translate_to_en.py b/scripts/translate_to_en.py index 90f9f1b..8858f9f 100644 --- a/scripts/translate_to_en.py +++ b/scripts/translate_to_en.py @@ -159,7 +159,9 @@ def main() -> int: if en_path.exists(): try: en_path.unlink() - print(f"[translate] Overwriting existing: {en_path.relative_to(REPO_ROOT)}") + print( + f"[translate] Overwriting existing: {en_path.relative_to(REPO_ROOT)}" + ) except OSError: # Best-effort unlink; continue with write which will overwrite contents pass From b88b2847ae27e2bc0bd418df9e7376a656c5fb23 Mon Sep 17 00:00:00 2001 From: ppippi-dev <61408680+ppippi-dev@users.noreply.github.com> Date: Sun, 10 Aug 2025 16:15:34 +0000 Subject: [PATCH 5/5] chore: add English translations for PR #6 --- .../2025-07-03-actions-runner-controller.md | 113 +++++++++--------- 1 file changed, 54 insertions(+), 59 deletions(-) diff --git a/_posts_en/2025-07-03-actions-runner-controller.md b/_posts_en/2025-07-03-actions-runner-controller.md index 09cc27d..3a5d39b 100644 --- a/_posts_en/2025-07-03-actions-runner-controller.md +++ b/_posts_en/2025-07-03-actions-runner-controller.md @@ -1,7 +1,7 @@ --- feature-img: assets/img/2025-07-03/0.png layout: post -subtitle: Building an MLOps CI Environment +subtitle: Building an MLOps CI environment tags: - MLOps - Infra @@ -11,48 +11,48 @@ title: Setting Up Actions Runner Controller ### Intro -As I’ve been enjoying building with AI lately, I’ve felt even more strongly how important a solid test environment is. +As I’ve been enjoying AI-driven development lately, the importance of a solid test environment has really hit home. -The most common approach is to set up CI with GitHub Actions, but in MLOps, CI often requires high-spec instances. +A common approach is to build CI with GitHub Actions, but in MLOps you often need high-spec instances for CI. -GitHub Actions does offer a [GPU instance (Linux 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at $0.07 per minute as of now, it’s quite expensive to use. +GitHub Actions does offer [GPU instances (Linux, 4 cores)](https://docs.github.com/ko/billing/managing-billing-for-your-products/about-billing-for-github-actions), but at the time of writing they cost $0.07 per minute, which is quite expensive. -It’s also limited to an Nvidia T4 GPU, which can be restrictive as model sizes continue to grow. +They’re also Nvidia T4 GPUs, which can be limiting performance-wise as models keep growing. -As an alternative, you can use a self-hosted runner. +A good alternative in this situation is a self-hosted runner. -As the name suggests, you set up the runner yourself and execute GitHub workflows on it. +As the name suggests, you set up the runner yourself and execute GitHub workflows on that runner. -You can configure this using GitHub’s [Add a self-hosted runner](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners). +You can configure it via GitHub’s [Add self-hosted runners](https://docs.github.com/ko/actions/how-tos/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners). -However, this approach requires keeping your CI machine always on (online), which can be inefficient if CI/CD jobs are infrequent. +However, this approach requires the CI machine to always be on (online), which can be inefficient if CI/CD jobs are infrequent. -That’s where the Actions Runner Controller (ARC) becomes an excellent alternative. +That’s where the Actions Runner Controller (ARC) shines as an excellent alternative. -[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open source project that lets you run GitHub Actions runners in a Kubernetes environment. +[Actions Runner Controller](https://github.com/actions/actions-runner-controller) is an open-source controller that manages GitHub Actions runners in a Kubernetes environment. -With it, you can run CI using your Kubernetes resources only when a GitHub Actions workflow is triggered. +With it, you can run CI on your own Kubernetes resources only when a GitHub Actions workflow is actually executed. -### Installing Actions Runner Controller +### Install Actions Runner Controller -The ARC installation consists of two major steps. +Installing ARC has two main steps: 1. Create a GitHub Personal Access Token for communication and authentication with GitHub -2. Install ARC with Helm and authenticate using the token +2. Install ARC via Helm and authenticate with the token you created #### 1. Create a GitHub Personal Access Token -ARC needs authentication to interact with the GitHub API to register and manage runners. Create a GitHub Personal Access Token (PAT). +ARC needs to authenticate to the GitHub API to register and manage runners. Create a GitHub Personal Access Token (PAT) for this. -* Path: `Settings` > `Developer settings` > `Personal access tokens` > `Tokens (classic)` > `Generate new token` +- Path: Settings > Developer settings > Personal access tokens > Tokens (classic) > Generate new token -When creating the PAT, select the [appropriate scopes](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.) +When creating the token, choose the [appropriate permissions](https://github.com/actions/actions-runner-controller/blob/master/docs/authenticating-to-the-github-api.md#deploying-using-pat-authentication). (For convenience here, grant full permissions.) > For security, use least privilege and set an expiration date. -It’s generally recommended to authenticate via a GitHub App rather than PAT. +It appears that authenticating via a GitHub App is recommended over using a PAT. -Keep the PAT safe—you’ll need it in the next step when installing ARC. +Keep the PAT safe—you’ll need it to install ARC in the next step. #### 2. Install ARC with Helm @@ -62,13 +62,9 @@ ARC requires cert-manager. If cert-manager isn’t set up in your cluster, insta kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml ``` -Now use Helm to install ARC into your Kubernetes cluster. +Now install ARC into your Kubernetes cluster with Helm. - - - - -Install ARC using the Personal Access Token you created earlier. Replace `YOUR_GITHUB_TOKEN` with your PAT in the command below. +Use the Personal Access Token you created earlier to install ARC. Replace YOUR_GITHUB_TOKEN below with your PAT value. ```bash helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller @@ -94,23 +90,23 @@ After installation, verify the ARC controller is running: kubectl get pods -n actions-runner-system ``` -If this succeeds, you’ll see the ARC controller manager pod running in the `actions-runner-system` namespace. +If the command succeeds, you should see the ARC controller manager pod running in the actions-runner-system namespace. -ARC is now ready to talk to GitHub! Next, define the runner that will actually execute your workflows. +ARC is now ready to talk to GitHub. Next, define the runner that will actually execute your workflows. -### 3. Configure the Runner +### 3. Configure a Runner -We’ve installed the ARC controller, but there’s no runner yet to execute workflows. We need to create runner pods based on GitHub Actions jobs. +The ARC controller is installed, but there’s no runner yet to execute workflows. You need to create runner pods based on GitHub Actions jobs. -We’ll use two resources: -1. RunnerDeployment: Acts as the template for runner pods—defines which container image to use, which GitHub repository to connect to, which labels to apply, etc. +You’ll use two resources: +1. RunnerDeployment: Acts as a template for runner pods. Defines the container image, target GitHub repository, labels, etc. 2. HorizontalRunnerAutoscaler (HRA): Watches the RunnerDeployment and automatically adjusts its replicas based on the number of queued jobs in GitHub. #### Define RunnerDeployment -Create a file named `runner-deployment.yml` as below. Change `spec.template.spec.repository` to your GitHub repository. +Create a file named runner-deployment.yml as below. Change spec.template.spec.repository to your own GitHub repo. -> If you have permissions, you can target an organization instead of a single repository. +> If you have permissions, you can also target an organization instead of a single repository. ```yaml apiVersion: actions.summerwind.dev/v1alpha1 @@ -128,16 +124,15 @@ spec: - arc-runner ``` -With this configured, you’ll see the self-hosted runner in your GitHub repo’s Actions. +With this configured, you can check the self-hosted runner in your GitHub repo’s Actions settings. -After it’s deployed, in a moment you’ll find a new runner registered under your repository’s Settings > Actions > Runners tab with the labels `self-hosted` and `arc-runner`. - +Once the deployment is up, after a short while you’ll see a new runner with labels self-hosted and arc-runner under Settings > Actions > Runners in your repository. #### Define HorizontalRunnerAutoscaler -Next, define an HRA to auto-scale the RunnerDeployment. Create a `hra.yml` file. +Next, define an HRA to autoscale the RunnerDeployment you just created. Create hra.yml: ```yaml apiVersion: actions.summerwind.dev/v1alpha1 @@ -152,11 +147,11 @@ spec: maxReplicas: 5 ``` -Set minReplicas and maxReplicas to scale up and down based on resources. +By setting minReplicas and maxReplicas, you can scale up and down based on available resources. -You can also specify additional metrics to create pods whenever a workflow is triggered. Many other metrics are available. +You can also configure additional metrics to create pods whenever there’s a workflow trigger. Many other metrics are supported. -> When you configure a HorizontalRunnerAutoscaler, runners are created only when needed. When there are zero runners, you won’t see them in the GitHub UI. +> When using HorizontalRunnerAutoscaler, runners are created only when needed. During idle periods (when there are zero runners), you won’t see any runners in the GitHub UI. @@ -174,16 +169,16 @@ spec: metrics: - type: TotalNumberOfQueuedAndInProgressWorkflowRuns repositoryNames: ["/"] +``` -The above is my preferred metric—it scales up when workflow runs are needed (i.e., when jobs are queued). -As shown, you can choose metrics as needed to get great results. +The above is my preferred metric—it scales up when workflows are queued. As shown, you can choose metrics to fit your needs and get great results. ### 4. Use it in a GitHub Actions workflow -We’re all set! Using the new ARC runner is simple: in your workflow file, set the `runs-on` key to the labels specified in the RunnerDeployment. +All set! Using the new ARC runner is simple: specify the labels you set in the RunnerDeployment under runs-on in your workflow. -Add a simple test workflow (`test-arc.yml`) under your repository’s `.github/workflows/` directory: +Add a simple test workflow (test-arc.yml) under .github/workflows/ in your repo: ```yaml name: ARC Runner Test @@ -207,19 +202,19 @@ jobs: sleep 10 ``` -The key is `runs-on: [self-hosted, arc-runner]`. When this workflow runs, GitHub assigns the job to a runner with both `self-hosted` and `arc-runner` labels. ARC detects the event and, based on the HRA settings, creates a new runner pod if needed to process the job. +The key part is runs-on: [self-hosted, arc-runner]. When this workflow runs, GitHub assigns the job to a runner that has both labels. ARC detects this event and, per your HRA settings, creates a new runner pod if needed to process the job. -> With self-hosted runners, unlike GitHub-hosted ones, you may need to install certain packages within the workflow. +> With self-hosted runners, unlike GitHub-hosted runners, you may need to install some packages within your workflow. ### Troubleshooting notes -I often use Docker for CI/CD, and one recurring issue is DinD (Docker in Docker). +For CI/CD, I often use Docker, and one recurring issue is Docker-in-Docker (DinD). -With ARC, by default a runner container (scheduling container) and a docker daemon container run as sidecars. +With ARC, by default the runner (scheduling) container and a docker daemon container run as sidecars. -To handle this, there are Docker images that support DinD. +To handle this, there’s a Docker image that supports DinD. -In a YAML like the one below, specify the image and set dockerdWithinRunnerContainer to run the Docker daemon inside the runner, and the workflow will run on that runner. +If you specify the image and dockerdWithinRunnerContainer as below, the Docker daemon runs inside the runner, and the workflow runs on that runner. ```yaml apiVersion: actions.summerwind.dev/v1alpha1 @@ -239,15 +234,15 @@ spec: dockerdWithinRunnerContainer: true ``` -For Docker tests that require GPUs, if your cluster has NVIDIA Container Toolkit installed, using the DinD image above can make GPUs visible. +For Docker tests that need GPUs, if your cluster has NVIDIA Container Toolkit installed, using the DinD image above allows the GPU to be recognized. -If you configure your workflow as below, you can confirm GPUs are properly set up even in a DinD scenario. (Be sure to check your NVIDIA Container Toolkit and NVIDIA GPU Driver Plugin versions!) +Configure your workflow like this to confirm GPUs work even in a DinD setup. (Make sure your NVIDIA Container Toolkit and NVIDIA GPU Driver Plugin versions are compatible!) ```bash -# GPU 디바이스 확인 +# Check GPU devices ls -la /dev/nvidia* -# device library setup +# Device/library setup smi_path=$(find / -name "nvidia-smi" 2>/dev/null | head -n 1) lib_path=$(find / -name "libnvidia-ml.so" 2>/dev/null | head -n 1) lib_dir=$(dirname "$lib_path") @@ -255,7 +250,7 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(dirname "$lib_path") export NVIDIA_VISIBLE_DEVICES=all export NVIDIA_DRIVER_CAPABILITIES=compute,utility -# nvidia runtime 없이 직접 GPU 디바이스와 라이브러리 마운트 +# Mount GPU devices and libraries directly without the nvidia runtime docker run -it \ --device=/dev/nvidia0:/dev/nvidia0 \ --device=/dev/nvidiactl:/dev/nvidiactl \ @@ -271,8 +266,8 @@ docker run -it \ ### Wrapping up -We’ve looked at how to build a dynamically scalable self-hosted runner environment by deploying Actions Runner Controller in Kubernetes. +We covered how to build a dynamically scalable self-hosted runner environment by deploying Actions Runner Controller in Kubernetes. -ARC helps you avoid the high cost of GitHub-hosted runners and the inefficiency of managing VMs for runners yourself. It’s especially powerful when building MLOps CI/CD environments that need GPUs or have complex dependencies. +Using ARC solves both the high cost of GitHub-hosted runners and the inefficiency of managing your own VMs for runners. ARC is especially powerful when you need GPUs or have complex dependencies in an MLOps CI/CD setup. -While the initial setup can feel a bit involved, once it’s in place it can significantly cut CI/CD costs and reduce operational burden. If you’re considering MLOps, it’s well worth evaluating. \ No newline at end of file +The initial setup can feel a bit involved, but once in place, it can significantly cut CI/CD costs and reduce operational burden. If you’re working on MLOps, it’s well worth considering. \ No newline at end of file