Maxcut dataset by JeremieGince · Pull Request #9 · MatchCake/MatchCake-Opt

JeremieGince · 2025-11-02T12:58:20Z

Description

This pull request introduces support for MaxCut graph datasets and models, refactors the data module structure for better modularity, and updates dependencies for improved compatibility and new features. It also includes several related changes to the pipeline and notebooks to accommodate the new structure and functionality.

Support for MaxCut problem:

Added a new MaxcutDataset class for generating and handling MaxCut graph datasets, including graph construction, data preparation, and bounds calculation. (src/matchcake_opt/datasets/maxcut_dataset.py)
Introduced a MaxcutDataModule for integrating MaxCut datasets with the training pipeline, including custom dataloader logic. (src/matchcake_opt/datamodules/maxcut_datamodule.py)
Added a MaxcutModel base class for MaxCut-specific model logic and training steps. (src/matchcake_opt/modules/maxcut_model.py)

Refactoring and modularization:

Moved DataModule and related imports from datasets to a new datamodules package, updating all relevant imports and notebook examples for clarity and separation of concerns. (src/matchcake_opt/datamodules/datamodule.py and related import changes) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Dependency and compatibility updates:

Relaxed the version constraint for torchvision and added torch-geometric as a new dependency. (pyproject.toml) [1] [2]
Added support for CUDA 13.0 (cu130) in the dependency configuration, including new indexes and conflict groups for PyTorch and torchvision. (pyproject.toml) [1] [2] [3]

Pipeline improvements and bug fixes:

Fixed logic for overwriting AutoML checkpoints and gracefully handled search space exhaustion in the AutoML pipeline. (src/matchcake_opt/tr_pipeline/automl_pipeline.py) [1] [2]
Removed unnecessary softmax from the BaseModel.predict method to allow more flexible prediction outputs. (src/matchcake_opt/modules/base_model.py)

Dataset and data module enhancements:

Added a prepare_data method to BaseDataset and updated DataModule to call this method for both training and test datasets, improving dataset initialization and reproducibility. (src/matchcake_opt/datasets/base_dataset.py, src/matchcake_opt/datamodules/datamodule.py) [1] [2]

These changes collectively improve the extensibility of the codebase, add support for graph-based learning tasks, and ensure compatibility with recent versions of dependencies.

Checklist

Please complete the following checklist when submitting a PR. The PR will not be reviewed until all items are checked.

All new features include a unit test.
Make sure that the tests passed and the coverage is
sufficient by running pytest tests --cov=src --cov-report=term-missing.
All new functions and code are clearly documented.
The code is formatted using Black.
You can do this by running black src tests.
The imports are sorted using isort.
You can do this by running isort src tests.
The code is type-checked using Mypy.
You can do this by running mypy src tests.

Introduces MaxcutDataset and MaxcutModel classes for Max-Cut graph optimization tasks using torch-geometric. Refactors datamodule structure, adds MaxcutDataModule, updates imports, and adds torch-geometric as a dependency. Also removes unnecessary softmax from BaseModel.predict.

Updated import statements to reference DataModule from the correct 'datamodules' package instead of 'datasets' in the notebook and pipeline modules. This resolves import errors after directory restructuring.

Replaced wildcard import from matchcake_opt.datasets with explicit import of DataModule from matchcake_opt.datamodules.datamodule in automl_pipeline_tutorial.ipynb and nif_deep_learning.ipynb for improved clarity and maintainability.

Replaces returning None in val_dataloader with raising MisconfigurationException to provide clearer error handling when validation data loader is not configured.

Added training_step, validation_step, and test_step methods to MaxcutModel for handling model training and evaluation. Also updated val_dataloader in MaxcutDataModule to return an empty list instead of raising an exception.

Changed MaxcutModel.predict to return a tensor instead of a dict, simplifying its output. Updated LightningPipeline.run_validation to handle empty metrics and ensure validation time is added to the correct metrics dictionary.

The test_step method now returns a dictionary of computed metrics and energy instead of just the loss value. This change enables more detailed evaluation outputs during testing.

Introduces a static method to convert bitstring samples to a numpy array of integers, supporting string and 1D array inputs for improved flexibility in data handling.

Updated metric update methods to include inputs and outputs, removed unused static methods and sample-based metrics computation, and simplified test step to return loss only. This streamlines the MaxcutModel class and aligns metric updates with expected input signatures.

Introduces a prepare_data() method to BaseDataset and updates MaxcutDataset to use it for graph construction and label assignment. DataModule now calls prepare_data() on datasets and defers train/val split until preparation, improving modularity and consistency in dataset handling.

Introduces the 'circular' graph type to MaxcutDataset, updates type annotations, and implements the _build_circular_graph method using networkx.circulant_graph. This allows users to generate circular graphs for Max-Cut problem datasets.

The run_test method now accepts an optional ckpt_path argument, allowing callers to specify which checkpoint to use during testing. The default remains 'best' for backward compatibility.

Replaces the empty validation dataloader with a DataLoader instance in MaxcutDataModule. Adds type annotations for the batch parameter in validation_step and test_step methods of MaxcutModel for improved type safety and clarity.

Wrapped the validation call in a try-except block to attempt validation with the 'last' checkpoint if the 'best' checkpoint is not found, improving robustness when the 'best' checkpoint is missing.

Catches the SearchSpaceExhausted exception when requesting new trials in the AutoMLPipeline, allowing the run loop to exit gracefully if the search space is exhausted.

When automl_overwrite_fit is True, the checkpoint folder is now removed before proceeding. This ensures a clean state for new AutoML runs and prevents issues from leftover checkpoints.

Expanded pyproject.toml and dependency resolution to support CUDA 13.0 (cu130) builds. This includes relaxing torchvision version constraints, adding cu130-specific dependency groups, and registering the new PyTorch cu130 index.

Moved DataModule imports from datasets to datamodules in test files for consistency. Added type hints and minor refactoring in datamodule and maxcut_datamodule. Added a new test suite for MaxcutDataset. Updated pyproject.toml to include types-networkx and torch_geometric modules for type checking.

Moved datamodule tests to a new test_datamodules directory and added test stubs for maxcut datamodule. Enhanced MaxcutDataset tests with parameterized graph types and parameters, improved test coverage, and added new tests for graph parameter validation and output shape. Minor code changes in maxcut_dataset.py to mark some error branches as uncovered for coverage tools. Updated .gitignore to exclude .tmp directory.

Introduces unit tests for MaxcutDataModule and MaxcutModel, covering their main methods and behaviors. Also adds pragma: no cover to NotImplementedError branches in both classes to improve test coverage reporting.

github-actions · 2025-11-02T14:29:03Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
867	844	97%	90%	🟢

New Files

File	Coverage	Status
src/matchcake_opt/datamodules/init.py	100%	🟢
src/matchcake_opt/datamodules/maxcut_datamodule.py	93%	🟢
src/matchcake_opt/datasets/maxcut_dataset.py	99%	🟢
src/matchcake_opt/modules/maxcut_model.py	100%	🟢
TOTAL	98%	🟢

Modified Files

File	Coverage	Status
src/matchcake_opt/init.py	100%	🟢
src/matchcake_opt/datasets/init.py	100%	🟢
src/matchcake_opt/datasets/base_dataset.py	100%	🟢
src/matchcake_opt/modules/base_model.py	100%	🟢
src/matchcake_opt/tr_pipeline/automl_pipeline.py	92%	🟢
src/matchcake_opt/tr_pipeline/lightning_pipeline.py	97%	🟢
TOTAL	98%	🟢

updated for commit: 99dc64a by action🐍

Updated the 'max_time' parameter in automl_pipeline_tutorial.ipynb, ligthning_pipeline_tutorial.ipynb, and nif_deep_learning.ipynb to shorten training duration for quicker runs and testing.

JeremieGince added 21 commits October 17, 2025 11:30

Fix import paths for DataModule references

dde02ae

Updated import statements to reference DataModule from the correct 'datamodules' package instead of 'datasets' in the notebook and pipeline modules. This resolves import errors after directory restructuring.

Update dataset import in notebooks

077ddcd

Replaced wildcard import from matchcake_opt.datasets with explicit import of DataModule from matchcake_opt.datamodules.datamodule in automl_pipeline_tutorial.ipynb and nif_deep_learning.ipynb for improved clarity and maintainability.

Raise MisconfigurationException in val_dataloader

0a3e550

Replaces returning None in val_dataloader with raising MisconfigurationException to provide clearer error handling when validation data loader is not configured.

Implement training, validation, and test steps in MaxcutModel

dadf22e

Added training_step, validation_step, and test_step methods to MaxcutModel for handling model training and evaluation. Also updated val_dataloader in MaxcutDataModule to return an empty list instead of raising an exception.

Refactor predict method and validation metrics handling

75066dd

Changed MaxcutModel.predict to return a tensor instead of a dict, simplifying its output. Updated LightningPipeline.run_validation to handle empty metrics and ensure validation time is added to the correct metrics dictionary.

Refactor test_step to return metrics components

24396dd

The test_step method now returns a dictionary of computed metrics and energy instead of just the loss value. This change enables more detailed evaluation outputs during testing.

Add bitstrings_to_arr utility to MaxcutModel

fb46bd5

Introduces a static method to convert bitstring samples to a numpy array of integers, supporting string and 1D array inputs for improved flexibility in data handling.

Add support for circular graphs in MaxcutDataset

2a5bfa7

Introduces the 'circular' graph type to MaxcutDataset, updates type annotations, and implements the _build_circular_graph method using networkx.circulant_graph. This allows users to generate circular graphs for Max-Cut problem datasets.

Update maxcut_dataset.py

48fc255

Add ckpt_path parameter to run_test method

bcd5a56

The run_test method now accepts an optional ckpt_path argument, allowing callers to specify which checkpoint to use during testing. The default remains 'best' for backward compatibility.

Implement proper val_dataloader and type annotations

79bab81

Replaces the empty validation dataloader with a DataLoader instance in MaxcutDataModule. Adds type annotations for the batch parameter in validation_step and test_step methods of MaxcutModel for improved type safety and clarity.

Fallback to 'last' checkpoint if 'best' is unavailable

e1daf86

Wrapped the validation call in a try-except block to attempt validation with the 'last' checkpoint if the 'best' checkpoint is not found, improving robustness when the 'best' checkpoint is missing.

Handle SearchSpaceExhausted in AutoML pipeline

b1400e1

Catches the SearchSpaceExhausted exception when requesting new trials in the AutoMLPipeline, allowing the run loop to exit gracefully if the search space is exhausted.

Add checkpoint folder cleanup on overwrite fit

5f27a8e

When automl_overwrite_fit is True, the checkpoint folder is now removed before proceeding. This ensures a clean state for new AutoML runs and prevents issues from leftover checkpoints.

Add CUDA 13.0 (cu130) support to dependencies

7686d85

Expanded pyproject.toml and dependency resolution to support CUDA 13.0 (cu130) builds. This includes relaxing torchvision version constraints, adding cu130-specific dependency groups, and registering the new PyTorch cu130 index.

Add tests for MaxcutDataModule and MaxcutModel

7608a49

Introduces unit tests for MaxcutDataModule and MaxcutModel, covering their main methods and behaviors. Also adds pragma: no cover to NotImplementedError branches in both classes to improve test coverage reporting.

Reduce max_time for training in tutorial notebooks

99dc64a

Updated the 'max_time' parameter in automl_pipeline_tutorial.ipynb, ligthning_pipeline_tutorial.ipynb, and nif_deep_learning.ipynb to shorten training duration for quicker runs and testing.

JeremieGince merged commit b1e49ea into dev Nov 2, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maxcut dataset#9

Maxcut dataset#9
JeremieGince merged 22 commits intodevfrom
maxcut_dataset

JeremieGince commented Nov 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JeremieGince commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

github-actions bot commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JeremieGince commented Nov 2, 2025 •

edited

Loading

github-actions bot commented Nov 2, 2025 •

edited

Loading