Skip to content

Add Auto-FL literature events and local compute budget search#4523

Merged
holgerroth merged 20 commits intoNVIDIA:mainfrom
holgerroth:codex/auto-fl-literature-events
May 6, 2026
Merged

Add Auto-FL literature events and local compute budget search#4523
holgerroth merged 20 commits intoNVIDIA:mainfrom
holgerroth:codex/auto-fl-literature-events

Conversation

@holgerroth
Copy link
Copy Markdown
Collaborator

Summary

  • add Auto-FL literature-review event tracking and progress plotting updates
  • simplify devcontainer/git setup around local experiment branches and commits
  • add local-compute search support with --local_train_steps alongside epoch-based training
  • refresh the Auto-FL README, program guidance, and bundled agent skill/runbook

Validation

  • python3 -m py_compile research/auto-fl-research/client.py research/auto-fl-research/job.py research/auto-fl-research/scripts/append_result.py research/auto-fl-research/scripts/validate_contract.py
  • black --check research/auto-fl-research/client.py research/auto-fl-research/job.py
  • isort --check-only --profile black research/auto-fl-research/client.py research/auto-fl-research/job.py
  • flake8 research/auto-fl-research
  • git diff --check
  • make validate from research/auto-fl-research
  • make smoke from research/auto-fl-research passed static checks and skipped runtime launch because the host Python has incompatible NVFlare API paths installed

Notes

  • Runtime experiment artifacts such as local results.tsv are not included in this PR.

@holgerroth holgerroth marked this pull request as ready for review May 5, 2026 23:29
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth requested review from ZiyueXu77 and pcnudde May 5, 2026 23:29
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR extends the Auto-FL research harness with three related capabilities: literature-review event tracking (new log_literature_review.py, plateau_watchdog.py, and plotting support), a step-based local compute budget (--local_train_steps) alongside the existing epoch-based mode, and tightened experiment provenance guards in the shell scripts.

  • Literature-review loop: log_literature_review.py records timed review cycles into results.tsv as status=literature rows; plateau_watchdog.py counts scored-candidate runs since the last material improvement or literature reset and emits a recommendation=literature/continue signal; plot_progress.py renders vertical event markers and annotation labels for these rows and separates literature runtime from candidate runtime in titles and the summary box.
  • Step-based training: --local_train_steps > 0 replaces the epoch loop with a step-counted iterator that recycles the data loader, calls scheduler.step() per optimizer step, and computes TensorBoard scalars relative to global step rather than global epoch; job.py, mutation_schema.yaml, and run_iteration.sh are updated consistently.
  • Provenance guards: init_run.sh and run_iteration.sh now hard-fail (exit 2) when run outside a git clone or outside an autoresearch/* branch, instead of silently degrading.

Confidence Score: 5/5

Safe to merge — all changed code paths are well-bounded, and no training or scoring logic is altered in a way that would corrupt existing results.

The step-based training path is properly isolated behind local_train_steps > 0, with matching validation in both job.py and client.py, correct NUM_STEPS_CURRENT_ROUND bookkeeping, and consistent scheduler and scaffold handling. The new literature-event scripts are append-only helpers that do not touch training outcomes. The only non-trivial concern is a cosmetic ambiguity in the repeated_terms diagnostic printed by plateau_watchdog.py, which does not affect the recommendation output the automation consumes.

No files require special attention beyond a close read of the plateau_watchdog.py diagnostic output logic.

Important Files Changed

Filename Overview
research/auto-fl-research/client.py Adds --local_train_steps for step-based training alongside epoch mode; correctly guards against empty data loaders, updates scheduler T_max, steps scheduler per optimizer step inside the new branch, and adjusts TensorBoard scalars and NUM_STEPS_CURRENT_ROUND.
research/auto-fl-research/job.py Mirrors the --local_train_steps argument and passes it through to client.py; validation guards match client.py.
research/auto-fl-research/scripts/plateau_watchdog.py New script detecting search plateaus to recommend literature review; the repeated-term deduplication heuristic has an edge case where equal-count wildcard and specific entries may both appear in output, but this only affects informational diagnostics and not the recommendation field.
research/auto-fl-research/scripts/log_literature_review.py New script for start/finish/log literature-review timing events; timer state persisted via a tmp-dir JSON keyed on the results parent directory hash; correctly handles missing timer files.
research/auto-fl-research/scripts/plot_progress.py Adds literature-event vertical markers and annotation labels to the progress plot; select_literature_labels correctly caps at max_labels; separates candidate and literature runtimes in the title and summary box.
research/auto-fl-research/scripts/append_result.py Adds --init-only mode and makes most fields optional, with post-parse validation that requires all non-score fields for non-init invocations and requires --score unless --status=literature.
research/auto-fl-research/scripts/validate_contract.py Updates the evaluate-branch contract check from ParamsType.DIFF to FLModel with metrics to match the new metrics-only send in the eval path; new contains_metrics_flmodel helper correctly uses the optional-predicate overload of call_has_keyword.
research/auto-fl-research/scripts/finalize_batch_status.py Fixes --last N semantics to select the last N candidate rows instead of the last N ledger rows; adds literature to the allowed status set.
research/auto-fl-research/scripts/run_iteration.sh Adds an autoresearch/* branch guard before launching experiments with result logging, and bumps the default timeout from 600s to 1200s to match the updated budget in mutation_schema.yaml.
research/auto-fl-research/mutation_schema.yaml Adds local_train_steps to mutable args and budget defaults; relaxes aggregation_epochs from the fixed_within_campaign list now that it is a local-compute knob alongside local_train_steps; doubles run_timeout_seconds to 1200.

Reviews (7): Last reviewed commit: "Merge branch 'main' into codex/auto-fl-l..." | Re-trigger Greptile

Comment thread research/auto-fl-research/scripts/plot_progress.py
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth enabled auto-merge (squash) May 6, 2026 00:17
@holgerroth holgerroth disabled auto-merge May 6, 2026 13:56
@holgerroth holgerroth enabled auto-merge (squash) May 6, 2026 16:18
Copy link
Copy Markdown
Collaborator

@ZiyueXu77 ZiyueXu77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to have literature and plateau watch

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth
Copy link
Copy Markdown
Collaborator Author

/build

@holgerroth holgerroth merged commit e275c37 into NVIDIA:main May 6, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants