Skip to content

AutoTester: Docker-based in-game integration test framework#491

Merged
Skidamek merged 44 commits into
mainfrom
autotester-upstream
Jun 27, 2026
Merged

AutoTester: Docker-based in-game integration test framework#491
Skidamek merged 44 commits into
mainfrom
autotester-upstream

Conversation

@Skidamek

@Skidamek Skidamek commented May 20, 2026

Copy link
Copy Markdown
Owner

An end-to-end test harness for AutoModpack's sync flow. It spins up a real Minecraft server + headless client in Docker, drives the client's in-game UI through a small file-based JSON bridge, and verifies a full sync — connect → trust certificate → download → restart → rejoin — across every supported target (MC 1.18.2–26.2, fabric/forge/neoforge).

Building it surfaced several real bugs, so their fixes are included here too.

Writing tests (autotester/)

Tests are declarative YAML — no Python needed to add one. A scenario's flow is a list of steps built from generic verbs:

flow:
  - use: boot                 # reusable macro from scenarios/_lib.yaml
  - use: accept_certificate
  - do: click
    select: { text: Download }
  - do: wait_for
    until: { file: "${modpack_dir}/${marker}" }
  - do: verify_files

Verbs (click, type, wait_for, verify_files, …), selectors (match GUI elements by text/role/state), conditions (screen, element, file, log, …), ${...} templating, and reusable macros are documented in autotester/README.md. New behavior a verb can't express is added once in the engine, then reused from YAML.

CLI: autotester build-images | run | clean. CI: the in-game matrix runs on manual dispatch (.github/workflows/ingame-tests.yml); the engine also has fast, Docker-free unit tests that run on every push.

Test code never ships in releases

AutoTestBridge and its dev mixins are compiled and bundled only under -Pautomodpack.autotest (autotester builds pass it; releases don't). In a normal build they're excluded from the source set and stripped from the mixin config, and they're additionally gated at runtime behind -Dautomodpack.autotest=true. Verified for both fabric and neoforge.

HeadlessMC

Stock HeadlessMC can't launch MC 26.2 headlessly, so the client image builds a patched fork. The repo/ref live in autotester/settings.yaml, pinned to a commit SHA for reproducibility; point ref at the upstream tag once the patch lands.

Bug fixes surfaced while building it

  • Disabling update-on-launch still loads the modpack. With updateSelectedModpackOnLaunch=false the installed modpack wasn't loaded at all; it now loads without contacting the server, so you can binary-search mods without AutoModpack restoring or deleting them.
  • DownloadClient leaks + blocking. Connections opened during pool hydration are now closed if a parallel connect fails (was leaking sockets + non-daemon threads); the blocking probe / certificate prompt / login continuation run on a dedicated daemon executor instead of ForkJoinPool.commonPool.
  • Login handshake races, a non-blocking certificate trust prompt, and a classloader crash in the async data-packet handler (notably Forge 1.19.2).
  • NeoForge FML locator module restructure (fml10/fml11).

Other

  • Jar-merge incremental fix: the merge now hashes the project's own jar, so main-module source changes are no longer missed (build via build, not mergeJar directly).
  • Dropped a few unused build targets (1.21.3, 1.20.6, 1.20.4, 1.19.4).

How to run

./gradlew build -Pautomodpack.autotest                 # bundle the instrumentation
uv --project autotester run autotester build-images    # once
uv --project autotester run autotester run --target 1.21.1-fabric --scenario sync

Results land in autotester/out/. See autotester/README.md for the full reference.

@Skidamek Skidamek force-pushed the autotester-upstream branch from a05272d to 6c555d5 Compare May 27, 2026 15:39
@Skidamek Skidamek marked this pull request as ready for review May 31, 2026 20:10
Skidamek added 28 commits June 26, 2026 12:40
Adds a docker-based autotest framework that runs real Minecraft
server + client containers, drives the UI through a file-based
JSON bridge (AutoTestBridge), and validates the modpack sync flow
against 22 version/loader targets.

Includes mod-side changes required to expose UI state for testing:
AutoTestBridge, async certificate trust helpers, EditBox inputText
persistence, and infrastructure cleanup for unsupported MC versions.
Adds a docker-based autotest framework that runs real Minecraft
server + client containers, drives the UI through a file-based
JSON bridge (AutoTestBridge), and validates the modpack sync flow
against 22 version/loader targets.

Includes mod-side changes required to expose UI state for testing:
AutoTestBridge, async certificate trust helpers, EditBox inputText
persistence, and infrastructure cleanup for unsupported MC versions.
…heck, click_restart error handling

- Fix TOCTOU races in accept_fingerprint and wait_join: capture get_screen once
  instead of calling it 4x per iteration (was reading different screen states)
- Add _jitter_sleep helper with ±20% random jitter to all poll loops,
  reducing thundering herd file-I/O contention when multiple tests run
  concurrently under jobs > 1
- _container_logs now accepts tail=N to fetch only last N lines instead of
  full log; _wait_for_log and _read_fingerprint use tail=200
- _phase_connect: add _assert_running health check, increase inner poll
  from 15s to 45s (overloaded PC may need longer), compute remaining time
- _phase_click_restart: raise RuntimeError if no restart button found instead
  of silently passing and hanging on _wait_exited
- BridgeClient.request accepts per-call timeout= kwarg; wait_bridge uses
  timeout=5 for initial ping (was 30s, kept stalling when Java bridge hadn't
  entered its poll loop yet)
- BridgeClient internal poll uses random.uniform(0.03, 0.07) jitter
… to a server before we even load all of the minecraft assets
thenApply inherits the completing thread (pool-5-thread-* from
DownloadClient's executor), where Forge's ModuleClassLoader.findClass
throws CNFE for classes loaded for the first time. switch to
thenApplyAsync so the lambda lands on ForkJoinPool.commonPool(),
whose workers can resolve classes normally.
Skidamek added 9 commits June 26, 2026 12:40
the .handle callback inherited the Connection executor thread
(pool-5-thread-*) where ModuleClassLoader.findClass can fail for
unresolved classes. switch to handleAsync so it runs on
ForkJoinPool.commonPool() like the DataC2SPacket fix.
…, clean up

Production fixes:
- DataC2SPacket: restore original flow so the client secret is still saved when
  the modpack content can't be fetched but the host is reachable (the async
  rewrite had dropped that case via an early return).
- DownloadClient: remove the now-dead sync constructor + establishProbeConnection
  /recoverProbeConnection (superseded by createAsync).
- Rename ModpackUpdater.CheckAndLoadModpack -> checkAndLoadModpack.
- Revert the whole-file spaces->tabs reindentation of Preload.java; keep only the
  real change (load installed modpack when updateSelectedModpackOnLaunch is off).
- Normalize stray tabs -> spaces (DataC2SPacket, ModpackUtils, ScreenImpl,
  ClientLoginNetworkAddon).
- Drop unused AutoTestBridge imports from Fabric/Forge/NeoForge init.

Autotester:
- Remove the unused render/menu/close bridge ops and their backing code
  (FontRenderMixin, RenderedTextCollector, FormattedText, mixins.json entry).
- Remove unused BridgeClient helpers (buttons, text_fields, click_point).
- Clearer phase names: wait_danger -> wait_download_prompt,
  click_confirm -> confirm_download, click_restart -> confirm_restart.
- Quiet the dev mixins: single client-ready log in onClientReady(), drop the
  100ms INFO spam and System.out.println.
- cli.py: move the run() return out of the finally block so real errors aren't
  swallowed.
- README: correct phase table and bridge-op list; document skip_fingerprint and
  verify_mods.
Rebased onto main's 26.2 port and hooked 26.2 into the autotester for
both Fabric and NeoForge.

Build/wiring:
- ModuleUtils: map 26.2 neoforge onto our fml11 module (main used fml10;
  this branch moved 26.x to fml11).
- ScreenImpl.setScreen: add the >=26.2 gui.setScreen conditional to match
  getScreen (kept our non-backgroundExecutor variant).
- stonecutter: active project set to 26.2-fabric.
- autotester targets: add 26.2-fabric (loader 0.19.3) and 26.2-neoforge
  (26.2.0.7-beta), Java 25.

Deps:
- Gradle 9.4.1 -> 9.6.0, fabric-loom(+remap) 1.16 -> 1.17-SNAPSHOT
  (loom 1.17 needs Gradle 9.5+), shadow 9.4.1 -> 9.4.3, kotlin 2.3.0 ->
  2.3.21. moddevgradle already latest (2.0.141).

AutoTestBridge / dev mixin: read the current screen via the existing
ScreenManager().getScreen() accessor and set the title screen via
minecraft.setScreen, so the 26.2 gui.* moves are covered by the port's
existing stonecutter replacements + ScreenImpl conditional. No new
stonecutter replacements added.

Builds clean for 26.2 fabric+neoforge (and 1.21.11 sanity). The 26.2
in-game autotest still fails: HeadlessMC 2.9.0's LWJGL stub leaves
org.lwjgl.system.Configuration.SHARED_LIBRARY_EXTRACT_PATH null, which
MC 26.2's new NativeLibrariesBootstrap reads -> NPE before any mod
loads. Needs a HeadlessMC-side fix.
@Skidamek Skidamek force-pushed the autotester-upstream branch from 91fa68d to d258a20 Compare June 26, 2026 10:57
Stock HeadlessMC can't launch MC 26.2 headlessly (its LWJGL stubs don't
satisfy 26.2's new render backend). Build the client image's HeadlessMC
launcher from a git repo/ref instead of downloading the prebuilt native
release:

- docker/client/Dockerfile: multi-stage build that clones HEADLESSMC_REPO at
  HEADLESSMC_REF and compiles the launcher-wrapper jar with JDK 21, installed
  as a `java -jar` hmc shim.
- settings.yaml: headlessmc.repo/ref select the build (defaults to the patched
  Skidamek/headlessmc @ mc26.2-headless); point elsewhere to use another build.
- cli.py: pass repo/ref through as Docker build args.
- README: document the HeadlessMC build source.

Verified: full sync matrix passes for all 22 targets (both loaders,
MC 1.18.2 through 26.2).
@Skidamek Skidamek force-pushed the autotester-upstream branch from 5229e53 to 2a34cb2 Compare June 26, 2026 16:22
Skidamek added 6 commits June 27, 2026 01:43
…regation

Address self-review comments on the autotester branch:

- neoforge: remove the forgified-fabric-api dependency, its >=1.21 guard and
  the FFAPI maven repo. Nothing in the neoforge source needs it — FabricInit is
  fabric-gated and FabricLoginMixin targets by string + @pseudo — and it was
  never bundled or declared in neoforge.mods.toml. Verified 1.21.8/26.2 neoforge
  still compile.
- mixins: drop the BlockPos no-op workaround in LoginQueryResponse/Request
  login mixins. Keep targeting the real packet class (reversed to its old name
  on <1.20.2 by the existing stonecutter replacement) and gate only the body.
  Verified the no-op path (1.18.2/1.19.2) and the injection path (1.21.8/26.2).
- ingame-tests.yml: drop `merge-multiple: true` on the report download — it
  collided every target's results.json into one, so aggregation only saw a
  single target.
- run-headlessmc-client: remove the dead commented forge/neoforge install
  blocks; document why only fabric needs an explicit profile install.
- settings.yaml: remove the unused run.retryMax key.
- autotester package: remove the empty __init__.py and switch packaging to
  namespace discovery.

Claude-Session: https://claude.ai/code/session_01AQ1GKvoVqnwharKmXpbwSz
…ad + net robustness

- Test instrumentation (AutoTestBridge + mixin/dev/*) is excluded from the source
  set and stripped from the mixin config in normal builds, and only bundled with
  -Pautomodpack.autotest. build.yml gains an `autotest` input; the in-game-tests
  workflow and README build with it. Verified: fabric & neoforge release jars
  contain none of it; autotest-flagged jars do.
- Fix "disable update-on-launch" so the installed modpack still loads: split a
  pure ModpackUpdater.loadModpack() (no server contact, no file reconciliation)
  out of checkAndLoadModpack and route Preload's updates-disabled path to it, so a
  binary search no longer loses or rewrites the modpack.
- DownloadClient: close every connection opened during pool hydration if any
  parallel connect fails (was leaking sockets + non-daemon threads); run the
  blocking probe / cert prompt / login continuation on a dedicated daemon executor
  instead of ForkJoinPool.commonPool (DownloadClient, ModpackUtils, DataC2SPacket).
- Pin the HeadlessMC fork to a commit SHA; the client Dockerfile now fetches by
  ref (branch/tag/SHA) for reproducible autotester builds.

Claude-Session: https://claude.ai/code/session_01AQ1GKvoVqnwharKmXpbwSz
…, macros)

Scenarios are now data, not Python phases. A flow is a list of steps, where each
step is a generic verb (click / type / wait_for / assert / verify_files / ...)
plus arguments, so new tests are written entirely in YAML.

Engine (automodpack_autotester/engine/):
  - registry: @verb decorator + name->fn lookup
  - context: per-case state, ${...} templating, bridge/log access
  - selectors: declarative GUI element matching (role/text/class/enabled/index)
  - conditions: boolean predicates (screen/element/file/log/all/any/not) shared
    by when:, wait_for.until:, assert.that:; log conditions capture regex groups
    into vars
  - steps_ui / steps_io: the UI and filesystem verbs
  - executor: macro expansion, when-gating, repeat, optional, per-step results

runner.py keeps the Docker lifecycle helpers and registers the lifecycle verbs
(launch_server, connect, wait_join, ...); run_case builds a Context and runs the
flow through the engine, recording per-step results into results.json.

Scenarios rewritten declaratively on a shared macro library (scenarios/_lib.yaml:
boot, accept_certificate, download_modpack, restart_client, rejoin). Behavior
(selectors, screen names, fingerprint regex, connect retry) matches the old
phases exactly.

Tests (tests/, no Docker): 33 unit + flow tests covering parsing, selectors,
conditions, templating, polling, and the executor, plus running the real shipped
scenarios/macros through a fake bridge. Verified end to end on real Docker:
1.21.1-fabric sync passes (21 steps, full boot -> sync -> restart -> rejoin).

README documents the verb/selector/condition/template/macro model.
- executor: `when`/`repeat` now apply to `use`/`group` steps too (were silently
  ignored); normalize a bare-string step once and gate/repeat uniformly.
- config: add `parse_server_files()` (shared by runner + tests, removes the
  triplicated serverFiles schema + default constants); memoize `load_macros()`.
- engine: drop dead `Context.case_dir`; add `${modpack_dir}` template var to
  replace the repeated `automodpack/modpacks/${modpack}` literal.
- steps_ui/steps_io: extract `_await_element` / `_await_exist` to remove the
  duplicated resolve-selector and wait-for-paths boilerplate.
- remove unused `relaunch_client` verb alias and `engine.get` re-export.
- tests: lock in when/repeat-on-macros behavior; reuse parse_server_files.

34 unit/flow tests pass; 1.21.1-fabric sync e2e still green.
Add reusable step.autotester-tests workflow that runs the new pytest engine
suite, and wire it into:
  - gradle.yml (Dev Builds) so every push validates the autotester engine
  - ingame-tests.yml as a fast-fail gate before the build + Docker matrix

No change to release.yml (releases still never pass -Pautomodpack.autotest).
@Skidamek Skidamek merged commit 5228243 into main Jun 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant