feat(garm): apply configurator runner options via per-scaleset GARM templates (ISD-5728)#238
Draft
cbartz wants to merge 21 commits into
Draft
feat(garm): apply configurator runner options via per-scaleset GARM templates (ISD-5728)#238cbartz wants to merge 21 commits into
cbartz wants to merge 21 commits into
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'status 409' substring match to match the specific error format string rather than bare '409' which could appear in error bodies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation databag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…armState Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rator relation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…efer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…N for pre_install_scripts - scaleset_reconciler: track all desired names (not just ready ones) so existing scalesets are NOT deleted when their provider or credential is temporarily unavailable - charm.py / configurator charm.py: switch pre_install_scripts wire format from str()/ast.literal_eval to json.dumps()/json.loads for a well-defined serialization; add regression test for the delete-on-skip bug Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…emplates Expose six runner-behaviour options on the garm-configurator charm (dockerhub-mirror, runner-http-proxy, aproxy-exclude-addresses, aproxy-redirect-ports, otel-collector-endpoint, pre-job-script), validated by a new RunnerConfig model and shared over the garm-configurator relation. The GARM charm reads them into ScalesetSpec.runner_config and the scaleset reconciler copies GARM's system github_linux template, injects the rendered pre-install and pre-job hook blocks (runner_template.py), and creates/updates a per-scaleset template via the templates API, referencing it by template_id. Also enable the scaleset reconcile path on this branch: register the GARM admin via GarmClient.first_run (_maybe_first_run) and target the in-pod GARM URL, and align the integration test to authenticate with the charm-managed admin credentials.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR exposes runner-behaviour options on the garm-configurator charm and applies them to GARM runners via per-scaleset GARM templates (copying the system github_linux template and injecting rendered blocks), instead of using extra_specs.pre_install_scripts. It also includes small GARM-charm enablement fixes around first-run initialization and local API URL resolution.
Changes:
- Add a
RunnerConfigmodel and rendering pipeline to inject proxy/mirror/telemetry/pre-job behavior into per-scaleset GARM templates. - Extend the GARM API client and scaleset reconciler to manage template lifecycle (create/update/delete) and reference templates via
template_id. - Update unit/integration tests to cover template API and template rendering (including an integration assertion against template content when available).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| charms/tests/integration/test_garm.py | Uses charm-managed admin secret for first-run/login; adds integration check that runner options appear in rendered template content. |
| charms/garm/tests/unit/test_scaleset_reconciler.py | Adds unit coverage for template creation/update/delete behavior during reconcile. |
| charms/garm/tests/unit/test_runner_template.py | New unit tests for runner-install template rendering and RunnerConfig parsing. |
| charms/garm/tests/unit/test_garm_api.py | Adds unit tests for template CRUD/list API calls and base64 encoding behavior. |
| charms/garm/tests/unit/test_charm.py | Updates tests for reconcile path to account for _maybe_first_run; adds first-run tests. |
| charms/garm/src/scaleset_reconciler.py | Implements per-scaleset template ensure/update/delete and passes template_id into scaleset create/update payloads. |
| charms/garm/src/runner_template.py | New module rendering the injected bash blocks into a base template. |
| charms/garm/src/garm_api.py | Adds template list/get/create/update/delete methods. |
| charms/garm/src/charm.py | Wires runner config into ScalesetSpec, adds _maybe_first_run, fixes local API URL, triggers first-run retries. |
| charms/garm-configurator/tests/unit/test_charm.py | Adds validation tests for runner config options and asserts relation databag fields are written/absent appropriately. |
| charms/garm-configurator/tests/unit/test_charm_state.py | Updates tests for new RunnerConfig field in charm state. |
| charms/garm-configurator/src/charm.py | Writes runner-option fields into the garm-configurator relation databag. |
| charms/garm-configurator/src/charm_state.py | Adds RunnerConfig parsing/validation for new runner options. |
| charms/garm-configurator/charmcraft.yaml | Adds new runner-behaviour config options and descriptions. |
…eared Address Copilot review on #238: - the reconciler now detaches the per-scaleset template (template_id -> 0) and deletes the unreferenced custom template when runner options are cleared, so operators can revert to the default template by clearing the config - aproxy-exclude-addresses rejects non-IP/CIDR tokens (they would fail in the nft ruleset and the failure would be swallowed) instead of silently accepting - align the runner-http-proxy and otel-collector-endpoint config descriptions with their http(s) URL validation
Collaborator
Author
|
Addressed all 5 review comments in 01a33bc:
AI-generated message. |
Address Copilot review on #238: when runner options are set but the system github_linux template is transiently absent from list_templates, _ensure_template returned 0, which detached and deleted an existing per-scaleset template and lost the runner config. Now keep an existing custom template untouched in that case and only fall back to the default template when there is none to keep.
Address Copilot review on #238: - _template_bytes now caches data fetched via get_template back onto the template dict, avoiding redundant GETs when the system template is reused across specs - aproxy-exclude-addresses now rejects IPv6 (and hostnames): the rendered nft ruleset uses an IPv4-only `table ip`, so IPv6 would pass validation but fail at runtime; the config description is updated to say IPv4
Address Copilot review on #238: - _maybe_first_run now logs in first and only calls /first-run when login fails, avoiding repeated first-run calls (and the 409s they log) on every update-status tick while still self-healing if GARM's database is reset - the scaleset reconciler lists templates with the github_linux partial-name filter instead of pulling every template on the controller
… creds helper Address Copilot review on #238: - block config that sets aproxy-exclude-addresses / aproxy-redirect-ports without runner-http-proxy, since the runner template only renders the aproxy block when a proxy is set (the options would otherwise silently no-op) - centralise the GARM admin-credentials secret lookup in a conftest helper (get_garm_admin_creds_from_secret) reused by garm_login_from_secret and the first-run integration helper
Address Copilot review on #238: the runner template renders relation databag values (aproxy redirect ports / exclude addresses) into a root-executed nft ruleset, so re-validate them at this trust boundary instead of relying on the configurator's validation alone — malformed and IPv6 tokens are dropped. Also document the unresolved ubuntu-vs-runner target for the static group-membership step under the existing ISD-278 TODO.
Address Copilot review on #238: the consumer-side port sanitisation checked only token shape, so out-of-range (0, 99999) or inverted (443-80) ports could still reach the nft ruleset and break it. Now also enforce 1..65535 and N<=M.
Address Copilot review on #238: - _validate_http_url rejects URLs containing whitespace/control characters, which are later rendered into scripts and env files - the otel endpoint has CR/LF stripped before being written to the runner env file, so a databag value can't inject extra env entries - remove the now-unused garm_app_name parameter from garm_login_from_secret and update its call sites
…docs Address Copilot review on #238: - guard the update-status and configurator-relation handlers with is_ready() so the GARM client's 30s timeout can't hang a hook while the workload is starting - only call list_templates when a spec has runner options or an existing scaleset references a custom template, avoiding an API call per reconcile otherwise - derive collision-resistant heredoc delimiters so an operator pre-job-script containing the delimiter can't truncate the generated script/env file
Address Copilot review on #238: _validate_http_url accepted URLs with invalid ports (e.g. http://host:bad) because urlparse does not validate them. Force evaluation of parsed.port (which raises on a bad port) and require a hostname.
3adaad7 to
89422a3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Exposes six runner-behaviour options on the
garm-configuratorcharm —dockerhub-mirror,runner-http-proxy,aproxy-exclude-addresses,aproxy-redirect-ports,otel-collector-endpoint,pre-job-script— and appliesthem to GARM runners via the GARM template system (per ISD278), not
extra_specs.pre_install_scripts.RunnerConfigmodel validates the options and sharesthem over the
garm-configuratorrelation databag.ScalesetSpec.runner_config; the scaleset reconciler copiesGARM's system
github_linuxtemplate, injects rendered pre-install and pre-job-hookblocks (
runner_template.py), and creates/updates a per-scaleset template via thetemplates API, referencing it by
template_id. NewGarmClienttemplate methods back this.Why we need it
Operators need to customise runner proxy / mirror / telemetry / pre-job behaviour without
modifying charm code (ISD-5728).
Checklist
CONTRIBUTING.mdhas been updated upon changes to the contribution/development process (e.g. changes to the way tests are run) — N/A, no process changesdocs/changelog.mdwith user-relevant changes — pending (new config options are user-relevant; will add)moduleslist correct — N/A, modified existingtest_garm.pymodule onlyTest plan
The scaleset-creation tests (including the new
test_runner_options_render_into_scaleset_template)skip until the provider-registration chain (feat(configurator): configurator integration #234) lands — the configurator derives
provider_name="openstack-<project>"while GARM only has the built-inopenstackprovider, sono scaleset is created. Same skip-guard as feat(garm): sync scalesets via GARM REST API without service restart (ISD-5729) #236's own scaleset tests; the template rendering logic
itself is fully unit-covered.
Review focus
runner_template.pybash rendering (aproxy / docker mirror / otel / pre-job hook). Production-onlystatic steps (tmate proxy, apt-mirror self-test, post-job metrics) are stubbed with
TODO(ISD-278)pending the old charm's cloud-init source.
_get_garm_urlnow targets the in-pod GARM URL(the previous config-based path read a non-existent option and always returned
""), and_maybe_first_runregisters the GARM admin via the existingGarmClient.first_run. These overlapwith Added generated OpenAPI client to GARM #232 (api-client / first-run) and will need reconciling if it lands first.
Potential breaking changes
None. All new config is optional; behaviour is unchanged when the options are unset.
New dependencies / APIs / modules
charms/garm/src/runner_template.py.GarmClienttemplate methods (list/get/create/update/delete_template).🤖 Generated with Claude Code