Skip to content

Refresh stale integration tests#4505

Merged
YuanTingHsieh merged 14 commits intoNVIDIA:mainfrom
YuanTingHsieh:codex/refresh-integration-tests
May 6, 2026
Merged

Refresh stale integration tests#4505
YuanTingHsieh merged 14 commits intoNVIDIA:mainfrom
YuanTingHsieh:codex/refresh-integration-tests

Conversation

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh commented Apr 30, 2026

Summary

  • refresh stale integration tests to match tightened recipe/API contracts
  • provide required model/per-site fixtures for recipe smoke tests
  • update preflight expectations and remove the embedded hello-pt pytest payload test
  • make the direct integration runner require/add localhost0 and localhost1 before preflight tests

Root Cause

Several integration tests were not updated after production APIs were intentionally tightened. These are test-side fixes only. The hello-pt app/custom test file was also being collected accidentally because it lived under tests/integration_test, but it is not used by the submitted job.

Validation

  • PYTHONPYCACHEPREFIX=/tmp/nvflare_pycache python3 -m py_compile tests/integration_test/preflight_check_test.py
  • PYTHONPYCACHEPREFIX=/tmp/nvflare_pycache python3 -m py_compile tests/integration_test/experiment_tracking_recipes_test.py tests/integration_test/xgb_vertical_recipe_test.py tests/integration_test/xgb_histogram_recipe_test.py tests/integration_test/preflight_check_test.py
  • bash -n tests/integration_test/run_integration_tests.sh
  • git diff --check
  • git verify-commit HEAD

@YuanTingHsieh YuanTingHsieh marked this pull request as ready for review April 30, 2026 21:30
Copilot AI review requested due to automatic review settings April 30, 2026 21:30
@YuanTingHsieh YuanTingHsieh changed the title [codex] Refresh stale integration tests Refresh stale integration tests Apr 30, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR refreshes stale integration and unit tests to match tightened recipe/API contracts, with no changes to production behaviour. The PackageChecker refactoring replaces raw integer return codes with a CheckStatus enum and removes the return-inside-finally antipattern. Test-side fixes add required model= fixtures to FedAvg recipe smoke tests, align SimEnv client names with per_site_config keys, and remove a test_custom.py file that was being accidentally collected by pytest.

  • CheckStatus enumcheck() and check_dry_run() now return a typed enum instead of 0/1/2 integers; preflight_check.py and all unit tests updated consistently.
  • Integration test harnessrun_integration_tests.sh now auto-adds localhost0/localhost1 to /etc/hosts when writable and the entries are missing, and restores the original file on exit via trap.
  • Preflight test assertions_verify_checks is relaxed to allow extra observed checks (e.g. "Check dry run") that appear conditionally, avoiding false failures when earlier checks pass.

Confidence Score: 5/5

Safe to merge — all changes are test-side fixes or internal refactoring of the preflight check enum with no behaviour changes to production federation logic.

The production change (CheckStatus enum) is a pure rename/type-safety improvement; the old integer codes and the new enum values map one-to-one, and the unit tests explicitly verify every status transition. All recipe test fixes are additive (providing previously-missing fixtures) and the hello-pt test file deletion removes dead code that was never part of a submitted job.

No files require special attention. The CheckStatus docstring nit in package_checker.py is the only finding.

Important Files Changed

Filename Overview
nvflare/tool/package_checker/package_checker.py Introduces CheckStatus enum replacing raw int return codes; removes the anti-pattern return inside finally; adds explicit except Exception in check_dry_run. The check() docstring lists CheckStatus.PASS as a possible return value but the implementation can never produce it when should_be_checked() is True.
nvflare/tool/preflight_check.py Adopts CheckStatus enum for branching cleanup logic; exit-code and pass/fail mapping are correct and well-tested.
tests/integration_test/preflight_check_test.py Relaxes _verify_checks to allow extra observed checks (e.g. "Check dry run"); adds expect_success propagation and better error messages. Changes are intentional and documented.
tests/integration_test/run_integration_tests.sh Adds ensure_localhost_aliases() helper with backup/restore trap for /etc/hosts; only called in run_preflight_check_test; logic is correct.
tests/integration_test/experiment_tracking_recipes_test.py Adds _make_model() using importlib.util.spec_from_file_location to avoid sys.modules pollution; passes model= fixture to FedAvgRecipe calls.
tests/integration_test/xgb_histogram_recipe_test.py Extracts _make_horizontal_per_site_config factory; switches SimEnv from num_clients= to clients= to keep client names consistent with per_site_config keys.
tests/integration_test/xgb_vertical_recipe_test.py Extracts _make_vertical_per_site_config factory; adds per_site_config= to all recipe constructor-only tests and aligns SimEnv client names.
tests/unit_test/tool/preflight_output_test.py Replaces raw integer return values in mocks with CheckStatus enum members; adds two new tests for FAIL (no cleanup) and PASS_WITH_CLEANUP paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[preflight_check.py\ncheck_packages] --> B{should_be_checked?}
    B -- No --> C[CheckStatus.PASS]
    B -- Yes --> D[PackageChecker.check]
    D --> E{All required\nrules pass?}
    E -- No --> F[CheckStatus.FAIL]
    E -- Yes --> G[check_dry_run]
    G --> H{process\nreturncode == 0?}
    H -- Yes --> I[CheckStatus.PASS_WITH_CLEANUP]
    H -- No --> J[CheckStatus.FAIL_WITH_CLEANUP]
    G --> K{TimeoutExpired?}
    K -- Yes --> I
    G --> L{Exception?}
    L -- Yes --> F
    I --> M[stop_dry_run force=False]
    J --> N[stop_dry_run force=True]
    F --> O[No cleanup]
    C --> O
Loading

Reviews (18): Last reviewed commit: "Merge branch 'main' into codex/refresh-i..." | Re-trigger Greptile

Comment thread tests/integration_test/experiment_tracking_recipes_test.py Outdated
Comment thread tests/integration_test/data/jobs/hello-pt/app/custom/test_custom.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refreshes several integration/smoke tests to align with tightened Recipe API/preflight-check expectations and to provide required fixtures for recipe-based runs.

Changes:

  • Update XGBoost recipe smoke tests to always provide per_site_config and to construct SimEnv using explicit client lists.
  • Adjust recipe-system and experiment-tracking integration tests to satisfy newer recipe/model requirements.
  • Update preflight-check integration tests’ expected check sets.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/integration_test/xgb_vertical_recipe_test.py Refactors per-site XGBoost vertical test setup into a helper and passes explicit clients/per_site_config.
tests/integration_test/xgb_histogram_recipe_test.py Refactors per-site XGBoost horizontal test setup into a helper and passes explicit clients/per_site_config.
tests/integration_test/recipe_system_test.py Updates dict-model config key used in a recipe simulation test.
tests/integration_test/preflight_check_test.py Updates expected preflight check outputs and related assertions.
tests/integration_test/experiment_tracking_recipes_test.py Adds explicit model fixture creation for experiment-tracking recipe tests.
tests/integration_test/data/jobs/hello-pt/app/custom/test_custom.py Makes hello-pt unit tests deterministic by injecting a single synthetic CIFAR batch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/integration_test/recipe_system_test.py Outdated
Comment thread tests/integration_test/preflight_check_test.py
Comment thread tests/integration_test/preflight_check_test.py
Comment thread tests/integration_test/preflight_check_test.py
Comment thread tests/integration_test/experiment_tracking_recipes_test.py Outdated
@YuanTingHsieh YuanTingHsieh force-pushed the codex/refresh-integration-tests branch from 2bcbe4c to 5ea2507 Compare May 1, 2026 18:09
nvidianz
nvidianz previously approved these changes May 1, 2026
@YuanTingHsieh YuanTingHsieh force-pushed the codex/refresh-integration-tests branch 3 times, most recently from 66761a7 to 28207cb Compare May 1, 2026 18:24
@YuanTingHsieh YuanTingHsieh force-pushed the codex/refresh-integration-tests branch from 28207cb to 0a303f8 Compare May 1, 2026 18:29
Copy link
Copy Markdown
Collaborator Author

Added a signed follow-up commit for the PyTorch hello-pt failure.

The failure was caused by the server PTFileModelPersistor starting with an empty local model while clients returned SimpleNetwork weights. Recent PyTorch persistence validation correctly rejects zero-match updates, so the job config now declares simple_network.SimpleNetwork on the server persistor.

Validation:

  • python3 -m json.tool tests/integration_test/data/jobs/hello-pt/app/config/config_fed_server.json
  • git diff --check
  • first PyTorch integration job (run hello-pt) via temp one-test YAML: 1 passed

Copy link
Copy Markdown
Collaborator Author

Added one more signed follow-up for the same PR #4392 PyTorch persistence tightening: pt_init_client also used an empty server PTFileModelPersistor, so it now declares simple_network.SimpleNetwork explicitly.

Validation:

  • python3 -m json.tool tests/integration_test/data/apps/pt_init_client/app/config/config_fed_server.json
  • git diff --check

@holgerroth
Copy link
Copy Markdown
Collaborator

Reviewed the diff. Test-only refresh, scope looks right. A few suggestions before merge:

1. Simplify _make_modelexperiment_tracking_recipes_test.py:73-90

The sys.path / sys.modules dance with _is_client_script_module is doing manually what importlib.util does cleanly without touching any globals:

def _make_model(self):
    import importlib.util
    spec = importlib.util.spec_from_file_location(
        "_exp_tracking_model", os.path.join(self.client_script_dir, "model.py")
    )
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module.SimpleNetwork()

This drops ~25 lines, removes the _is_client_script_module helper entirely, and avoids the (admittedly narrow) parallel-pytest race where another test mutates sys.modules["model"] mid-import.

2. Silent preflight failures — preflight_check_test.py:120

Switching from subprocess.check_output to subprocess.run(..., check=False) swallows non-zero exit codes from the preflight tool itself. Combined with _verify_checks no longer rejecting unexpected checks, a crash that still prints recognizable lines could pass. At minimum, log process.returncode so failures are visible in test output. If non-zero exits are only expected on the "before overseer start" path, consider gating check=False to that path rather than blanket-applying it.

3. /etc/hosts mutation — run_integration_tests.sh:75-95

  • trap restore_localhost_aliases EXIT is good for normal exits, but SIGKILL leaves the appended line behind. Fine for CI, occasionally surprising for local devs running this script directly. A short comment noting the script only modifies /etc/hosts when the user has write perms (typically root in CI) would help future readers.
  • ensure_localhost_aliases installs an unconditional trap … EXIT, which would clobber any future EXIT trap added elsewhere. Currently safe (no other traps), just worth flagging if the script grows.

4. Validation

The PR description notes the direct preflight path wasn't exercised locally because /etc/hosts needed sudo. Worth confirming CI runs the experiment-tracking and preflight tests green end-to-end before merge — py_compile + bash -n won't catch the model-loading or hosts-mutation logic.

The xgb _make_*_per_site_config helpers and the test_custom.py removal both look clean — no concerns there.

Copy link
Copy Markdown
Collaborator Author

Addressed the two review follow-ups in signed commit 2642d7db7:

  • Simplified experiment tracking _make_model() to load model.py via importlib.util without mutating sys.path / sys.modules.
  • Made preflight command return codes explicit: success paths now assert return code 0 and include captured output on failure; the intentional before-overseer-start failure path opts out via expect_success=False.

Validation:

  • PYTHONPYCACHEPREFIX=/tmp/nvflare_pycache python3 -m py_compile tests/integration_test/experiment_tracking_recipes_test.py tests/integration_test/preflight_check_test.py
  • git diff --check

@pcnudde
Copy link
Copy Markdown
Collaborator

pcnudde commented May 1, 2026

/build

@pcnudde
Copy link
Copy Markdown
Collaborator

pcnudde commented May 2, 2026

/build

…tion-tests

# Conflicts:
#	tests/integration_test/preflight_check_test.py
@YuanTingHsieh YuanTingHsieh force-pushed the codex/refresh-integration-tests branch from 229174b to 66676f3 Compare May 4, 2026 19:50
@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

@YuanTingHsieh YuanTingHsieh added the cicd continuous integration/continuous development label May 4, 2026
@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

@YuanTingHsieh YuanTingHsieh merged commit 4d56d03 into NVIDIA:main May 6, 2026
28 of 29 checks passed
@YuanTingHsieh YuanTingHsieh deleted the codex/refresh-integration-tests branch May 6, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cicd continuous integration/continuous development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants