CI fixes: venv isolation, python 3.12, logstash memory#109
Merged
Conversation
The Logstash container was hitting OOM (rc=137) during idempotence on rockylinux9 with ES8. With 2GB and a 512m JVM heap plus ES overhead, there wasn't enough headroom for a clean restart. 3GB gives room for the JVM to reinitialize without tripping the OOM killer.
ansible-core 2.20 requires Python >= 3.12. The self-hosted runners ship with 3.11 as default, so the sanity and compatibility jobs were failing. This adds a version check step that selects python3.12 when testing against ansible-core 2.20+.
All CI workflows (molecule, linting, plugins) ran uv pip install --system which shared a single Python environment across 20 concurrent runners. Jobs racing to install different ansible-core versions would clobber each other's binaries, causing command-not-found and permission errors. Each job now creates its own venv in RUNNER_TEMP via uv venv. This isolates all Python dependencies per job and eliminates the shared-state race condition.
fef16df to
10d3916
Compare
The rolling upgrade previously only ran when elasticstack_version was pinned to a specific version. This left two gaps: 1. Changing elasticstack_release from 8 to 9 without pinning a version would install the new package but skip the node-by-node restart. 2. Running with state: latest would upgrade all nodes simultaneously through the normal handler, bringing the whole cluster down at once. Now the rolling upgrade triggers in all cases: - Pre-install: when the target version or major release differs from the installed version (pinned version or release change). - Post-install: when the normal package task changed the package (covers the latest case where we can't predict pre-install). A 10-second countdown with Ctrl+C abort option runs before any rolling upgrade so users are aware of what is about to happen. The countdown duration is configurable via elasticsearch_upgrade_countdown.
10d3916 to
d8ad04e
Compare
When elasticstack_version is set to 'latest', every new minor or patch release triggers a rolling restart. This is safe but may surprise users who run the playbook frequently. Added a warning to the reference docs and a code comment on the package install tasks explaining why.
The upgrade scenarios previously pinned elasticstack_version to a specific 9.x version, which bypassed the new release-change detection. Now they only set elasticstack_release: 9 without pinning a version, exercising the real-world upgrade path where the role detects the major version mismatch and triggers the rolling upgrade automatically. Also sets elasticsearch_upgrade_countdown: 0 to skip the interactive pause in CI.
After the 8→9 upgrade completes, re-runs the role with elasticstack_version: latest. Since the package is already at the latest 9.x, the package task should report no change and ES should NOT be restarted. Verified by comparing the ES process PID before and after the re-run.
Without an explicit state, ansible.builtin.package defaults to state: present, which means 'installed, don't upgrade'. When elasticstack_version is not pinned, the package name is just 'elasticsearch' with no version suffix, so the package manager sees it as already installed and does nothing. All package tasks in the rolling upgrade now use state: latest so the package manager actually installs the newest version from the target release repository.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three CI fixes:
Logstash molecule container memory bumped from 2GB to 3GB to prevent OOM during idempotence checks on rockylinux9 with ES8. The 512m JVM heap plus OS overhead exceeded 2GB on restart.
ansible-core 2.20+ requires Python 3.12. The sanity and compatibility test jobs now select python3.12 when testing against 2.20+.
All CI workflows (molecule, linting, plugins) switched from uv pip install --system to per-job venvs. The 20 self-hosted runners share a single filesystem, and concurrent --system installs were racing — one job's package upgrade would delete binaries another job was about to use. Each job now creates an isolated venv in RUNNER_TEMP.