Merged
Conversation
Introduces MaxcutDataset and MaxcutModel classes for Max-Cut graph optimization tasks using torch-geometric. Refactors datamodule structure, adds MaxcutDataModule, updates imports, and adds torch-geometric as a dependency. Also removes unnecessary softmax from BaseModel.predict.
Updated import statements to reference DataModule from the correct 'datamodules' package instead of 'datasets' in the notebook and pipeline modules. This resolves import errors after directory restructuring.
Replaced wildcard import from matchcake_opt.datasets with explicit import of DataModule from matchcake_opt.datamodules.datamodule in automl_pipeline_tutorial.ipynb and nif_deep_learning.ipynb for improved clarity and maintainability.
Replaces returning None in val_dataloader with raising MisconfigurationException to provide clearer error handling when validation data loader is not configured.
Added training_step, validation_step, and test_step methods to MaxcutModel for handling model training and evaluation. Also updated val_dataloader in MaxcutDataModule to return an empty list instead of raising an exception.
Changed MaxcutModel.predict to return a tensor instead of a dict, simplifying its output. Updated LightningPipeline.run_validation to handle empty metrics and ensure validation time is added to the correct metrics dictionary.
The test_step method now returns a dictionary of computed metrics and energy instead of just the loss value. This change enables more detailed evaluation outputs during testing.
Introduces a static method to convert bitstring samples to a numpy array of integers, supporting string and 1D array inputs for improved flexibility in data handling.
Updated metric update methods to include inputs and outputs, removed unused static methods and sample-based metrics computation, and simplified test step to return loss only. This streamlines the MaxcutModel class and aligns metric updates with expected input signatures.
Introduces a prepare_data() method to BaseDataset and updates MaxcutDataset to use it for graph construction and label assignment. DataModule now calls prepare_data() on datasets and defers train/val split until preparation, improving modularity and consistency in dataset handling.
Introduces the 'circular' graph type to MaxcutDataset, updates type annotations, and implements the _build_circular_graph method using networkx.circulant_graph. This allows users to generate circular graphs for Max-Cut problem datasets.
The run_test method now accepts an optional ckpt_path argument, allowing callers to specify which checkpoint to use during testing. The default remains 'best' for backward compatibility.
Replaces the empty validation dataloader with a DataLoader instance in MaxcutDataModule. Adds type annotations for the batch parameter in validation_step and test_step methods of MaxcutModel for improved type safety and clarity.
Wrapped the validation call in a try-except block to attempt validation with the 'last' checkpoint if the 'best' checkpoint is not found, improving robustness when the 'best' checkpoint is missing.
Catches the SearchSpaceExhausted exception when requesting new trials in the AutoMLPipeline, allowing the run loop to exit gracefully if the search space is exhausted.
When automl_overwrite_fit is True, the checkpoint folder is now removed before proceeding. This ensures a clean state for new AutoML runs and prevents issues from leftover checkpoints.
Expanded pyproject.toml and dependency resolution to support CUDA 13.0 (cu130) builds. This includes relaxing torchvision version constraints, adding cu130-specific dependency groups, and registering the new PyTorch cu130 index.
Moved DataModule imports from datasets to datamodules in test files for consistency. Added type hints and minor refactoring in datamodule and maxcut_datamodule. Added a new test suite for MaxcutDataset. Updated pyproject.toml to include types-networkx and torch_geometric modules for type checking.
Moved datamodule tests to a new test_datamodules directory and added test stubs for maxcut datamodule. Enhanced MaxcutDataset tests with parameterized graph types and parameters, improved test coverage, and added new tests for graph parameter validation and output shape. Minor code changes in maxcut_dataset.py to mark some error branches as uncovered for coverage tools. Updated .gitignore to exclude .tmp directory.
Introduces unit tests for MaxcutDataModule and MaxcutModel, covering their main methods and behaviors. Also adds pragma: no cover to NotImplementedError branches in both classes to improve test coverage reporting.
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
Updated the 'max_time' parameter in automl_pipeline_tutorial.ipynb, ligthning_pipeline_tutorial.ipynb, and nif_deep_learning.ipynb to shorten training duration for quicker runs and testing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces support for MaxCut graph datasets and models, refactors the data module structure for better modularity, and updates dependencies for improved compatibility and new features. It also includes several related changes to the pipeline and notebooks to accommodate the new structure and functionality.
Support for MaxCut problem:
MaxcutDatasetclass for generating and handling MaxCut graph datasets, including graph construction, data preparation, and bounds calculation. (src/matchcake_opt/datasets/maxcut_dataset.py)MaxcutDataModulefor integrating MaxCut datasets with the training pipeline, including custom dataloader logic. (src/matchcake_opt/datamodules/maxcut_datamodule.py)MaxcutModelbase class for MaxCut-specific model logic and training steps. (src/matchcake_opt/modules/maxcut_model.py)Refactoring and modularization:
DataModuleand related imports fromdatasetsto a newdatamodulespackage, updating all relevant imports and notebook examples for clarity and separation of concerns. (src/matchcake_opt/datamodules/datamodule.pyand related import changes) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Dependency and compatibility updates:
torchvisionand addedtorch-geometricas a new dependency. (pyproject.toml) [1] [2]pyproject.toml) [1] [2] [3]Pipeline improvements and bug fixes:
src/matchcake_opt/tr_pipeline/automl_pipeline.py) [1] [2]BaseModel.predictmethod to allow more flexible prediction outputs. (src/matchcake_opt/modules/base_model.py)Dataset and data module enhancements:
prepare_datamethod toBaseDatasetand updatedDataModuleto call this method for both training and test datasets, improving dataset initialization and reproducibility. (src/matchcake_opt/datasets/base_dataset.py,src/matchcake_opt/datamodules/datamodule.py) [1] [2]These changes collectively improve the extensibility of the codebase, add support for graph-based learning tasks, and ensure compatibility with recent versions of dependencies.
Checklist
Please complete the following checklist when submitting a PR. The PR will not be reviewed until all items are checked.
Make sure that the tests passed and the coverage is
sufficient by running
pytest tests --cov=src --cov-report=term-missing.You can do this by running
black src tests.You can do this by running
isort src tests.You can do this by running
mypy src tests.