Refactor WLM integration and iterate2 functionality with plugins#60
Merged
Conversation
… and Vela - Removed hardcoded WLM options and parameters from the argument parser. - Added support for a user-defined WLM plugin via `--wlm-plugin` argument. - Implemented `load_wlm_config` function to read WLM settings from HPO YAML. - Created reference implementations for LSF and Vela plugins in `examples/wlm_plugins/`. - Updated `run_and_stream` to handle local execution and WLM plugin invocation. - Enhanced logging to provide clearer feedback on trial execution and WLM interactions. - Cleaned up unused functions and parameters related to previous WLM handling. Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
- Updated `run_setter_example.sh` to use `bumpy_function.py` instead of `bumpy_setter.py`, simplifying the example for local trials. - Modified `run_vela_example.sh` to clarify usage of the Vela/OpenShift job submission, ensuring better documentation and example clarity. - Refined `lsf_plugin.sh` to streamline job submission for IBM Spectrum LSF, enhancing clarity on environment variable usage and command construction. - Overhauled `_iterate2.py` to simplify the command-line interface, improve YAML loading, and enhance metric extraction logic. - Removed deprecated features and improved logging for better traceability during execution. - Enhanced the objective function to better handle parameter suggestions and metrics extraction. Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
… clusters Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
…parameter Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
… directory, and performance reporting options Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
…o require PostgreSQL URL Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
… across examples Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
Signed-off-by: Romeo Kienzler <romeo.kienzler1@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request modernizes and clarifies the workflow for running hyperparameter optimization (HPO) with
iterate2, focusing on a plugin-based workload manager (WLM) interface, improved documentation, and streamlined example scripts. The changes shift from built-in WLM logic to a flexible plugin system, clarify environment variable usage, and update example configurations and scripts to match the new approach. There are also quality-of-life improvements to the example HPO function and YAML search space.Key changes include:
Documentation and Workflow Overhaul
Replaced the built-in WLM backend system with a plugin-based WLM interface:
iterate2now delegates all workload management to user-supplied plugin scripts, passing configuration via environment variables (e.g.,ITERATE_WLM_GPU_COUNT,ITERATE_TRIAL_CMD). This enables support for any cluster or local execution environment and decouplesiterate2from cluster-specific logic. [1] [2] [3] [4] [5] [6]Updated the documentation (
docs/iterate2.md) to describe the new plugin system, environment variable interface, and revised command-line options. Removed legacy WLM-specific options in favor of--wlm-plugin, and clarified how to configure resources via the HPO YAMLwlm:section. [1] [2] [3] [4] [5] [6]Example and Configuration Updates
Refactored example cluster submission scripts (
examples/run_lsf_gridfm_example_postgres.sh,examples/run_ccc_gridfm_example.sh) to use the new plugin interface, removing embedded cluster logic from the scripts and delegating it to dedicated plugin scripts. [1] [2]Updated the example HPO YAML (
examples/bumpy_hpo.yaml) to clarify the structure and ensure correct formatting for static and metric sections, matching the expectations ofiterate2.Fixed a typo in the data path in the gridfm HPO config (
configs/gridfm_graphkit_hpo.yaml).Example Trial Script Improvements
examples/bumpy_function.py): it now reads all parameters and output paths from environment variables as set byiterate2, and writes metric output to the required file. This ensures compatibility with the new plugin-based workflow. [1] [2]Most important changes:
Plugin-based WLM interface and documentation
ITERATE_WLM_GPU_COUNT. [1] [2] [3] [4]Example scripts and configuration
Example trial script
examples/bumpy_function.pyto use environment variables for all parameters and output, ensuring compatibility with the newiterate2workflow. [1] [2]