Use train/val split instead of k-folds by JeremieGince · Pull Request #17 · MatchCake/MatchCake-Opt

JeremieGince · 2026-02-10T16:35:10Z

Description

Replace N_FOLDS/fold_id-based k-fold splitting with a fixed train/validation split. Add DEFAULT_TRAIN_VAL_SPLIT and new split_id parameter (used as RNG seed) and expose train_val_split in from_dataset_name and the DataModule constructor with validation assertions. _split_train_val_dataset now uses random_split into [train, val] lengths instead of concatenating k-fold subsets. Update type hint for train_dataset and adjust MaxcutDataModule to pass split_id. Remove N_FOLDS constant.

Checklist

Please complete the following checklist when submitting a PR. The PR will not be reviewed until all items are checked.

All new features include a unit test.
Make sure that the tests passed and the coverage is
sufficient by running pytest tests --cov=src --cov-report=term-missing.
All new functions and code are clearly documented.
The code is formatted using Black.
You can do this by running black src tests.
The imports are sorted using isort.
You can do this by running isort src tests.
The code is type-checked using Mypy.
You can do this by running mypy src tests.

Replace N_FOLDS/fold_id-based k-fold splitting with a fixed train/validation split. Add DEFAULT_TRAIN_VAL_SPLIT and new split_id parameter (used as RNG seed) and expose train_val_split in from_dataset_name and the DataModule constructor with validation assertions. _split_train_val_dataset now uses random_split into [train, val] lengths instead of concatenating k-fold subsets. Update type hint for train_dataset and adjust MaxcutDataModule to pass split_id. Remove N_FOLDS constant.

Add class and __init__ docstrings to DataModule to clarify responsibilities and parameters. Tighten type hints by changing _train_dataset to Optional[Subset] and making _split_train_val_dataset return Tuple[Subset, Subset] instead of generic Any. Update MaxcutDataModule.from_dataset_name signature: rename fold_id to split_id, add a train_val_split kw-only parameter, and default batch_size, random_state, and num_workers to DataModule's DEFAULT_* constants for consistent defaults and clearer API.

Replace fold_id with split_id in automl_pipeline_tutorial.ipynb and ligthning_pipeline_tutorial.ipynb. Updated the variable declaration and the argument passed to DataModule.from_dataset_name to match the newer API that expects split_id.

github-actions · 2026-02-10T16:55:59Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
897	871	97%	90%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
src/matchcake_opt/datamodules/datamodule.py	100%	🟢
src/matchcake_opt/datamodules/maxcut_datamodule.py	93%	🟢
TOTAL	97%	🟢

updated for commit: 1b61385 by action🐍

JeremieGince added 3 commits February 10, 2026 11:34

Use split_id instead of fold_id in notebooks

1b61385

Replace fold_id with split_id in automl_pipeline_tutorial.ipynb and ligthning_pipeline_tutorial.ipynb. Updated the variable declaration and the argument passed to DataModule.from_dataset_name to match the newer API that expects split_id.

JeremieGince merged commit d5282c9 into dev Feb 10, 2026
6 checks passed

JeremieGince deleted the add_stratified-shuffle-split branch February 10, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use train/val split instead of k-folds#17

Use train/val split instead of k-folds#17
JeremieGince merged 3 commits intodevfrom
add_stratified-shuffle-split

JeremieGince commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JeremieGince commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JeremieGince commented Feb 10, 2026 •

edited

Loading

github-actions bot commented Feb 10, 2026 •

edited

Loading