Skip to content

feat(garm): apply configurator runner options via per-scaleset GARM templates (ISD-5728)#238

Draft
cbartz wants to merge 21 commits into
mainfrom
feat/configurator-app-config-ISD-5728
Draft

feat(garm): apply configurator runner options via per-scaleset GARM templates (ISD-5728)#238
cbartz wants to merge 21 commits into
mainfrom
feat/configurator-app-config-ISD-5728

Conversation

@cbartz

@cbartz cbartz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What this PR does

Exposes six runner-behaviour options on the garm-configurator charm —
dockerhub-mirror, runner-http-proxy, aproxy-exclude-addresses,
aproxy-redirect-ports, otel-collector-endpoint, pre-job-script — and applies
them to GARM runners via the GARM template system (per ISD278), not
extra_specs.pre_install_scripts.

  • garm-configurator: a new RunnerConfig model validates the options and shares
    them over the garm-configurator relation databag.
  • garm: reads them into ScalesetSpec.runner_config; the scaleset reconciler copies
    GARM's system github_linux template, injects rendered pre-install and pre-job-hook
    blocks (runner_template.py), and creates/updates a per-scaleset template via the
    templates API, referencing it by template_id. New GarmClient template methods back this.

Stacked on #236 (feat/garm-scaleset-isd-5729) — base it there for a clean diff.
Includes two small GARM-charm enablement fixes needed for the reconcile path (see Review focus).

Why we need it

Operators need to customise runner proxy / mirror / telemetry / pre-job behaviour without
modifying charm code (ISD-5728).

Checklist

  • Changes comply with the project's coding standards and guidelines (see CONTRIBUTING.md and STYLE.md)
  • CONTRIBUTING.md has been updated upon changes to the contribution/development process (e.g. changes to the way tests are run) — N/A, no process changes
  • Technical author has been assigned to review the PR in case of documentation changes (usually *.md files) — N/A, no doc changes
  • I updated docs/changelog.md with user-relevant changes — pending (new config options are user-relevant; will add)
  • I used AI to assist with preparing this PR
  • I added or updated tests as needed (unit and integration)
  • If integration test modules are used: workflow modules list correct — N/A, modified existing test_garm.py module only
  • If this PR involves a Grafana dashboard — N/A
  • If this PR involves Terraform — N/A
  • If this PR involves Rockcraft — N/A, no rock change

Test plan

  • Unit: 74 garm + 45 garm-configurator tests pass; ruff + pyright clean.
  • Integration (local microk8s, reused GARM rock): 11 passed, 4 skipped, 0 failed.
    The scaleset-creation tests (including the new test_runner_options_render_into_scaleset_template)
    skip until the provider-registration chain (feat(configurator): configurator integration #234) lands — the configurator derives
    provider_name="openstack-<project>" while GARM only has the built-in openstack provider, so
    no scaleset is created. Same skip-guard as feat(garm): sync scalesets via GARM REST API without service restart (ISD-5729) #236's own scaleset tests; the template rendering logic
    itself is fully unit-covered.

Review focus

  • runner_template.py bash rendering (aproxy / docker mirror / otel / pre-job hook). Production-only
    static steps (tmate proxy, apt-mirror self-test, post-job metrics) are stubbed with TODO(ISD-278)
    pending the old charm's cloud-init source.
  • Enablement fixes (distinct from the feature): _get_garm_url now targets the in-pod GARM URL
    (the previous config-based path read a non-existent option and always returned ""), and
    _maybe_first_run registers the GARM admin via the existing GarmClient.first_run. These overlap
    with Added generated OpenAPI client to GARM #232 (api-client / first-run) and will need reconciling if it lands first.
  • Per-scaleset template lifecycle in the reconciler (create, update-on-drift, delete-with-scaleset).

Potential breaking changes

None. All new config is optional; behaviour is unchanged when the options are unset.

New dependencies / APIs / modules

  • New module charms/garm/src/runner_template.py.
  • New GarmClient template methods (list/get/create/update/delete_template).
  • No new third-party dependencies.

🤖 Generated with Claude Code

florentianayuwono and others added 11 commits June 16, 2026 12:37
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'status 409' substring match to match the specific error format
string rather than bare '409' which could appear in error bodies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation databag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…armState

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rator relation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…efer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…N for pre_install_scripts

- scaleset_reconciler: track all desired names (not just ready ones) so
  existing scalesets are NOT deleted when their provider or credential is
  temporarily unavailable
- charm.py / configurator charm.py: switch pre_install_scripts wire format
  from str()/ast.literal_eval to json.dumps()/json.loads for a well-defined
  serialization; add regression test for the delete-on-skip bug

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…emplates

Expose six runner-behaviour options on the garm-configurator charm
(dockerhub-mirror, runner-http-proxy, aproxy-exclude-addresses,
aproxy-redirect-ports, otel-collector-endpoint, pre-job-script), validated by a
new RunnerConfig model and shared over the garm-configurator relation. The GARM
charm reads them into ScalesetSpec.runner_config and the scaleset reconciler
copies GARM's system github_linux template, injects the rendered pre-install and
pre-job hook blocks (runner_template.py), and creates/updates a per-scaleset
template via the templates API, referencing it by template_id.

Also enable the scaleset reconcile path on this branch: register the GARM admin
via GarmClient.first_run (_maybe_first_run) and target the in-pod GARM URL, and
align the integration test to authenticate with the charm-managed admin
credentials.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes runner-behaviour options on the garm-configurator charm and applies them to GARM runners via per-scaleset GARM templates (copying the system github_linux template and injecting rendered blocks), instead of using extra_specs.pre_install_scripts. It also includes small GARM-charm enablement fixes around first-run initialization and local API URL resolution.

Changes:

  • Add a RunnerConfig model and rendering pipeline to inject proxy/mirror/telemetry/pre-job behavior into per-scaleset GARM templates.
  • Extend the GARM API client and scaleset reconciler to manage template lifecycle (create/update/delete) and reference templates via template_id.
  • Update unit/integration tests to cover template API and template rendering (including an integration assertion against template content when available).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
charms/tests/integration/test_garm.py Uses charm-managed admin secret for first-run/login; adds integration check that runner options appear in rendered template content.
charms/garm/tests/unit/test_scaleset_reconciler.py Adds unit coverage for template creation/update/delete behavior during reconcile.
charms/garm/tests/unit/test_runner_template.py New unit tests for runner-install template rendering and RunnerConfig parsing.
charms/garm/tests/unit/test_garm_api.py Adds unit tests for template CRUD/list API calls and base64 encoding behavior.
charms/garm/tests/unit/test_charm.py Updates tests for reconcile path to account for _maybe_first_run; adds first-run tests.
charms/garm/src/scaleset_reconciler.py Implements per-scaleset template ensure/update/delete and passes template_id into scaleset create/update payloads.
charms/garm/src/runner_template.py New module rendering the injected bash blocks into a base template.
charms/garm/src/garm_api.py Adds template list/get/create/update/delete methods.
charms/garm/src/charm.py Wires runner config into ScalesetSpec, adds _maybe_first_run, fixes local API URL, triggers first-run retries.
charms/garm-configurator/tests/unit/test_charm.py Adds validation tests for runner config options and asserts relation databag fields are written/absent appropriately.
charms/garm-configurator/tests/unit/test_charm_state.py Updates tests for new RunnerConfig field in charm state.
charms/garm-configurator/src/charm.py Writes runner-option fields into the garm-configurator relation databag.
charms/garm-configurator/src/charm_state.py Adds RunnerConfig parsing/validation for new runner options.
charms/garm-configurator/charmcraft.yaml Adds new runner-behaviour config options and descriptions.

Comment thread charms/garm/src/scaleset_reconciler.py
Comment thread charms/garm/src/scaleset_reconciler.py Outdated
Comment thread charms/garm-configurator/src/charm_state.py
Comment thread charms/garm-configurator/charmcraft.yaml Outdated
Comment thread charms/garm-configurator/charmcraft.yaml Outdated
…eared

Address Copilot review on #238:
- the reconciler now detaches the per-scaleset template (template_id -> 0) and
  deletes the unreferenced custom template when runner options are cleared, so
  operators can revert to the default template by clearing the config
- aproxy-exclude-addresses rejects non-IP/CIDR tokens (they would fail in the
  nft ruleset and the failure would be swallowed) instead of silently accepting
- align the runner-http-proxy and otel-collector-endpoint config descriptions
  with their http(s) URL validation
@cbartz

cbartz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed all 5 review comments in 01a33bc:

  • 1–2 (reconciler template detach): when runner options are cleared the reconciler now reverts the scaleset to GARM's default template (template_id → 0) and deletes the now-unreferenced custom template; added test_template_detached_when_runner_config_cleared.
  • 3 (aproxy-exclude-addresses): now rejects non-IP/CIDR tokens instead of silently accepting them (they would fail in the nft ruleset); added a unit test.
  • 4–5 (docs): updated the runner-http-proxy and otel-collector-endpoint config descriptions to state the http(s) URL requirement.

AI-generated message.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comment thread charms/garm/src/scaleset_reconciler.py
Address Copilot review on #238: when runner options are set but the system
github_linux template is transiently absent from list_templates, _ensure_template
returned 0, which detached and deleted an existing per-scaleset template and lost
the runner config. Now keep an existing custom template untouched in that case
and only fall back to the default template when there is none to keep.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread charms/garm/src/scaleset_reconciler.py
Comment thread charms/garm-configurator/src/charm_state.py
Address Copilot review on #238:
- _template_bytes now caches data fetched via get_template back onto the template
  dict, avoiding redundant GETs when the system template is reused across specs
- aproxy-exclude-addresses now rejects IPv6 (and hostnames): the rendered nft
  ruleset uses an IPv4-only `table ip`, so IPv6 would pass validation but fail at
  runtime; the config description is updated to say IPv4

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread charms/garm/src/charm.py
Comment thread charms/garm/src/scaleset_reconciler.py
Address Copilot review on #238:
- _maybe_first_run now logs in first and only calls /first-run when login fails,
  avoiding repeated first-run calls (and the 409s they log) on every
  update-status tick while still self-healing if GARM's database is reset
- the scaleset reconciler lists templates with the github_linux partial-name
  filter instead of pulling every template on the controller

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread charms/garm-configurator/src/charm_state.py
Comment thread charms/tests/integration/test_garm.py Outdated
… creds helper

Address Copilot review on #238:
- block config that sets aproxy-exclude-addresses / aproxy-redirect-ports without
  runner-http-proxy, since the runner template only renders the aproxy block when
  a proxy is set (the options would otherwise silently no-op)
- centralise the GARM admin-credentials secret lookup in a conftest helper
  (get_garm_admin_creds_from_secret) reused by garm_login_from_secret and the
  first-run integration helper

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment thread charms/garm/src/runner_template.py Outdated
Comment thread charms/garm/src/runner_template.py
Address Copilot review on #238: the runner template renders relation databag
values (aproxy redirect ports / exclude addresses) into a root-executed nft
ruleset, so re-validate them at this trust boundary instead of relying on the
configurator's validation alone — malformed and IPv6 tokens are dropped. Also
document the unresolved ubuntu-vs-runner target for the static group-membership
step under the existing ISD-278 TODO.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comment thread charms/garm/src/runner_template.py Outdated
Address Copilot review on #238: the consumer-side port sanitisation checked only
token shape, so out-of-range (0, 99999) or inverted (443-80) ports could still
reach the nft ruleset and break it. Now also enforce 1..65535 and N<=M.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment thread charms/garm-configurator/src/charm_state.py
Comment thread charms/garm/src/runner_template.py
Comment thread charms/tests/integration/conftest.py Outdated
Address Copilot review on #238:
- _validate_http_url rejects URLs containing whitespace/control characters, which
  are later rendered into scripts and env files
- the otel endpoint has CR/LF stripped before being written to the runner env
  file, so a databag value can't inject extra env entries
- remove the now-unused garm_app_name parameter from garm_login_from_secret and
  update its call sites

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment thread charms/garm/src/charm.py Outdated
Comment thread charms/garm/src/scaleset_reconciler.py Outdated
Comment thread charms/garm/src/runner_template.py Outdated
…docs

Address Copilot review on #238:
- guard the update-status and configurator-relation handlers with is_ready() so
  the GARM client's 30s timeout can't hang a hook while the workload is starting
- only call list_templates when a spec has runner options or an existing scaleset
  references a custom template, avoiding an API call per reconcile otherwise
- derive collision-resistant heredoc delimiters so an operator pre-job-script
  containing the delimiter can't truncate the generated script/env file

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comment thread charms/garm-configurator/src/charm_state.py
Address Copilot review on #238: _validate_http_url accepted URLs with invalid
ports (e.g. http://host:bad) because urlparse does not validate them. Force
evaluation of parsed.port (which raises on a bad port) and require a hostname.
@florentianayuwono florentianayuwono force-pushed the feat/garm-scaleset-isd-5729 branch from 3adaad7 to 89422a3 Compare June 17, 2026 08:25
Base automatically changed from feat/garm-scaleset-isd-5729 to main June 26, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants