Skip to content

Maxcut dataset#9

Merged
JeremieGince merged 22 commits intodevfrom
maxcut_dataset
Nov 2, 2025
Merged

Maxcut dataset#9
JeremieGince merged 22 commits intodevfrom
maxcut_dataset

Conversation

@JeremieGince
Copy link
Contributor

@JeremieGince JeremieGince commented Nov 2, 2025

Description

This pull request introduces support for MaxCut graph datasets and models, refactors the data module structure for better modularity, and updates dependencies for improved compatibility and new features. It also includes several related changes to the pipeline and notebooks to accommodate the new structure and functionality.

Support for MaxCut problem:

  • Added a new MaxcutDataset class for generating and handling MaxCut graph datasets, including graph construction, data preparation, and bounds calculation. (src/matchcake_opt/datasets/maxcut_dataset.py)
  • Introduced a MaxcutDataModule for integrating MaxCut datasets with the training pipeline, including custom dataloader logic. (src/matchcake_opt/datamodules/maxcut_datamodule.py)
  • Added a MaxcutModel base class for MaxCut-specific model logic and training steps. (src/matchcake_opt/modules/maxcut_model.py)

Refactoring and modularization:

  • Moved DataModule and related imports from datasets to a new datamodules package, updating all relevant imports and notebook examples for clarity and separation of concerns. (src/matchcake_opt/datamodules/datamodule.py and related import changes) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Dependency and compatibility updates:

  • Relaxed the version constraint for torchvision and added torch-geometric as a new dependency. (pyproject.toml) [1] [2]
  • Added support for CUDA 13.0 (cu130) in the dependency configuration, including new indexes and conflict groups for PyTorch and torchvision. (pyproject.toml) [1] [2] [3]

Pipeline improvements and bug fixes:

  • Fixed logic for overwriting AutoML checkpoints and gracefully handled search space exhaustion in the AutoML pipeline. (src/matchcake_opt/tr_pipeline/automl_pipeline.py) [1] [2]
  • Removed unnecessary softmax from the BaseModel.predict method to allow more flexible prediction outputs. (src/matchcake_opt/modules/base_model.py)

Dataset and data module enhancements:

  • Added a prepare_data method to BaseDataset and updated DataModule to call this method for both training and test datasets, improving dataset initialization and reproducibility. (src/matchcake_opt/datasets/base_dataset.py, src/matchcake_opt/datamodules/datamodule.py) [1] [2]

These changes collectively improve the extensibility of the codebase, add support for graph-based learning tasks, and ensure compatibility with recent versions of dependencies.


Checklist

Please complete the following checklist when submitting a PR. The PR will not be reviewed until all items are checked.

  • All new features include a unit test.
    Make sure that the tests passed and the coverage is
    sufficient by running pytest tests --cov=src --cov-report=term-missing.
  • All new functions and code are clearly documented.
  • The code is formatted using Black.
    You can do this by running black src tests.
  • The imports are sorted using isort.
    You can do this by running isort src tests.
  • The code is type-checked using Mypy.
    You can do this by running mypy src tests.

Introduces MaxcutDataset and MaxcutModel classes for Max-Cut graph optimization tasks using torch-geometric. Refactors datamodule structure, adds MaxcutDataModule, updates imports, and adds torch-geometric as a dependency. Also removes unnecessary softmax from BaseModel.predict.
Updated import statements to reference DataModule from the correct 'datamodules' package instead of 'datasets' in the notebook and pipeline modules. This resolves import errors after directory restructuring.
Replaced wildcard import from matchcake_opt.datasets with explicit import of DataModule from matchcake_opt.datamodules.datamodule in automl_pipeline_tutorial.ipynb and nif_deep_learning.ipynb for improved clarity and maintainability.
Replaces returning None in val_dataloader with raising MisconfigurationException to provide clearer error handling when validation data loader is not configured.
Added training_step, validation_step, and test_step methods to MaxcutModel for handling model training and evaluation. Also updated val_dataloader in MaxcutDataModule to return an empty list instead of raising an exception.
Changed MaxcutModel.predict to return a tensor instead of a dict, simplifying its output. Updated LightningPipeline.run_validation to handle empty metrics and ensure validation time is added to the correct metrics dictionary.
The test_step method now returns a dictionary of computed metrics and energy instead of just the loss value. This change enables more detailed evaluation outputs during testing.
Introduces a static method to convert bitstring samples to a numpy array of integers, supporting string and 1D array inputs for improved flexibility in data handling.
Updated metric update methods to include inputs and outputs, removed unused static methods and sample-based metrics computation, and simplified test step to return loss only. This streamlines the MaxcutModel class and aligns metric updates with expected input signatures.
Introduces a prepare_data() method to BaseDataset and updates MaxcutDataset to use it for graph construction and label assignment. DataModule now calls prepare_data() on datasets and defers train/val split until preparation, improving modularity and consistency in dataset handling.
Introduces the 'circular' graph type to MaxcutDataset, updates type annotations, and implements the _build_circular_graph method using networkx.circulant_graph. This allows users to generate circular graphs for Max-Cut problem datasets.
The run_test method now accepts an optional ckpt_path argument, allowing callers to specify which checkpoint to use during testing. The default remains 'best' for backward compatibility.
Replaces the empty validation dataloader with a DataLoader instance in MaxcutDataModule. Adds type annotations for the batch parameter in validation_step and test_step methods of MaxcutModel for improved type safety and clarity.
Wrapped the validation call in a try-except block to attempt validation with the 'last' checkpoint if the 'best' checkpoint is not found, improving robustness when the 'best' checkpoint is missing.
Catches the SearchSpaceExhausted exception when requesting new trials in the AutoMLPipeline, allowing the run loop to exit gracefully if the search space is exhausted.
When automl_overwrite_fit is True, the checkpoint folder is now removed before proceeding. This ensures a clean state for new AutoML runs and prevents issues from leftover checkpoints.
Expanded pyproject.toml and dependency resolution to support CUDA 13.0 (cu130) builds. This includes relaxing torchvision version constraints, adding cu130-specific dependency groups, and registering the new PyTorch cu130 index.
Moved DataModule imports from datasets to datamodules in test files for consistency. Added type hints and minor refactoring in datamodule and maxcut_datamodule. Added a new test suite for MaxcutDataset. Updated pyproject.toml to include types-networkx and torch_geometric modules for type checking.
Moved datamodule tests to a new test_datamodules directory and added test stubs for maxcut datamodule. Enhanced MaxcutDataset tests with parameterized graph types and parameters, improved test coverage, and added new tests for graph parameter validation and output shape. Minor code changes in maxcut_dataset.py to mark some error branches as uncovered for coverage tools. Updated .gitignore to exclude .tmp directory.
Introduces unit tests for MaxcutDataModule and MaxcutModel, covering their main methods and behaviors. Also adds pragma: no cover to NotImplementedError branches in both classes to improve test coverage reporting.
@github-actions
Copy link

github-actions bot commented Nov 2, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
867 844 97% 90% 🟢

New Files

File Coverage Status
src/matchcake_opt/datamodules/init.py 100% 🟢
src/matchcake_opt/datamodules/maxcut_datamodule.py 93% 🟢
src/matchcake_opt/datasets/maxcut_dataset.py 99% 🟢
src/matchcake_opt/modules/maxcut_model.py 100% 🟢
TOTAL 98% 🟢

Modified Files

File Coverage Status
src/matchcake_opt/init.py 100% 🟢
src/matchcake_opt/datasets/init.py 100% 🟢
src/matchcake_opt/datasets/base_dataset.py 100% 🟢
src/matchcake_opt/modules/base_model.py 100% 🟢
src/matchcake_opt/tr_pipeline/automl_pipeline.py 92% 🟢
src/matchcake_opt/tr_pipeline/lightning_pipeline.py 97% 🟢
TOTAL 98% 🟢

updated for commit: 99dc64a by action🐍

Updated the 'max_time' parameter in automl_pipeline_tutorial.ipynb, ligthning_pipeline_tutorial.ipynb, and nif_deep_learning.ipynb to shorten training duration for quicker runs and testing.
@JeremieGince JeremieGince merged commit b1e49ea into dev Nov 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant