Skip to content

Use train/val split instead of k-folds#17

Merged
JeremieGince merged 3 commits intodevfrom
add_stratified-shuffle-split
Feb 10, 2026
Merged

Use train/val split instead of k-folds#17
JeremieGince merged 3 commits intodevfrom
add_stratified-shuffle-split

Conversation

@JeremieGince
Copy link
Contributor

@JeremieGince JeremieGince commented Feb 10, 2026

Description

Replace N_FOLDS/fold_id-based k-fold splitting with a fixed train/validation split. Add DEFAULT_TRAIN_VAL_SPLIT and new split_id parameter (used as RNG seed) and expose train_val_split in from_dataset_name and the DataModule constructor with validation assertions. _split_train_val_dataset now uses random_split into [train, val] lengths instead of concatenating k-fold subsets. Update type hint for train_dataset and adjust MaxcutDataModule to pass split_id. Remove N_FOLDS constant.


Checklist

Please complete the following checklist when submitting a PR. The PR will not be reviewed until all items are checked.

  • All new features include a unit test.
    Make sure that the tests passed and the coverage is
    sufficient by running pytest tests --cov=src --cov-report=term-missing.
  • All new functions and code are clearly documented.
  • The code is formatted using Black.
    You can do this by running black src tests.
  • The imports are sorted using isort.
    You can do this by running isort src tests.
  • The code is type-checked using Mypy.
    You can do this by running mypy src tests.

Replace N_FOLDS/fold_id-based k-fold splitting with a fixed train/validation split. Add DEFAULT_TRAIN_VAL_SPLIT and new split_id parameter (used as RNG seed) and expose train_val_split in from_dataset_name and the DataModule constructor with validation assertions. _split_train_val_dataset now uses random_split into [train, val] lengths instead of concatenating k-fold subsets. Update type hint for train_dataset and adjust MaxcutDataModule to pass split_id. Remove N_FOLDS constant.
Add class and __init__ docstrings to DataModule to clarify responsibilities and parameters. Tighten type hints by changing _train_dataset to Optional[Subset] and making _split_train_val_dataset return Tuple[Subset, Subset] instead of generic Any. Update MaxcutDataModule.from_dataset_name signature: rename fold_id to split_id, add a train_val_split kw-only parameter, and default batch_size, random_state, and num_workers to DataModule's DEFAULT_* constants for consistent defaults and clearer API.
Replace fold_id with split_id in automl_pipeline_tutorial.ipynb and ligthning_pipeline_tutorial.ipynb. Updated the variable declaration and the argument passed to DataModule.from_dataset_name to match the newer API that expects split_id.
@github-actions
Copy link

github-actions bot commented Feb 10, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
897 871 97% 90% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
src/matchcake_opt/datamodules/datamodule.py 100% 🟢
src/matchcake_opt/datamodules/maxcut_datamodule.py 93% 🟢
TOTAL 97% 🟢

updated for commit: 1b61385 by action🐍

@JeremieGince JeremieGince merged commit d5282c9 into dev Feb 10, 2026
6 checks passed
@JeremieGince JeremieGince deleted the add_stratified-shuffle-split branch February 10, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant