WSMF

tl;dr

This package contains implementations of two novel approaches to warm-starting encoder-based warm-start of the Bayesian Hyperparameter Optimization. It allows both training and using meta-models which can help in this meta-task.

Experiment reproduction

To reproduce experiments install requirements from file requirements.txt and then run script ./scripts/run_all_experiments.sh. To run experiments on toy datasets change in 10th line in file experiments_engine/paths.py "tasks.json" to "tasks_toy.json"

Metric learning (Dataset2VecMetricLearning) - As an encoder it uses Dataset2Vec which is trained in a way that it produces representations whose distances to each other correspond to distances of the landmarkers (vectors of performances of a predefined set of hyperparameter configuration)
Landmarker reconstruction (LandmarkerReconstructionTrainingInterface) - As an encoder it uses Dataset2Vec which produces a latent representation of the entire dataset (of any size) and passes it to MLP which outputs predictions of the landmarker vector

Selectors - for usage for this intended meta-task wsmf provides API to use encoder for proposing hyperparameter configuration. It contains the following samplers:

Selector which is choosing based on the learned representation that is applicable in the metric learning approach (RepresentationBasedHpSelector)
Selector which is based on the reconstructed landmarkers (ReconstructionBasedHpSelector)
Random selector from the predefined portfolio (RandomHpSelector)
Selector which chooses the best configuration on average (RankBasedHpSelector)
Selector that chooses configurations based on the vector of landmarkers itself (LandmarkerHpSelector)

Examples of usage

Training metric learning based meta-model

# tensors X, y are torch.Tensor objects which correspond to feature and target matrices
train_datasets = { # training meta-dataset
    "dataset_train_1": (tensor_X1, tensor_y1),
    "dataset_train_2": (tensor_X2, tensor_y2),
    ...
}
val_datasets = { # validation meta-dataset
    "dataset_val_1": (tensor_X1, tensor_y1),
    "dataset_val_2": (tensor_X2, tensor_y2),
    ...
}

# tensors l1, l2, .. corresponds to vector of landmarkers in torch.Tensor format
train_landmarkers = { # training meta-dataset
    "dataset_train_1": l1,
    "dataset_train_2": l2,
    ...
}
val_landmarkers = { # validation meta-dataset
    "dataset_val_1": l1,
    "dataset_val_2": l2,
    ...
}

train_dataset = EncoderHpoDataset(train_datasets, train_landmarkers)
train_dataloader = EncoderMetricLearningLoader(train_dataset, train_num_batches, train_batch_size)
val_dataset = EncoderHpoDataset(val_datasets, val_landmarkers)
val_dataloader = EncoderMetricLearningLoader(val_dataset, val_num_batches, val_batch_size)
val_dataloader = GenericRepeatableDataLoader(val_dataloader) # Loader which produces repeatable batches

model = Dataset2VecMetricLearning()
trainer = pl.Trainer()
trainer.fit(model, train_loader, val_loader)

Using selector based on reconstruction

datasets = { # datasets to search from (in this case used for closest dataset search)
    "dataset_1": (tensor_X1, tensor_y1),
    "dataset_2": (tensor_X2, tensor_y2),
    ...
}
landmarkers = { # landmarkers to search from (is this case used for proposing best configurations)
    "dataset_val_1": l1,
    "dataset_val_2": l2,
    ...
}
configurations = [
    {"hp1": val1, "hp2": val2},
    {"hp1": val3, "hp2": val4},
    ...
]

meta_model = Dataset2VecForLandmarkerReconstruction.load_from_checkpoint("path_to_meta_model.ckpt")
selector = ReconstructionBasedHpSelector(
    meta_model,
    datasets,
    landmarkers,
    configurations
)
# Usage
new_dataset = (X, y) # torch.Tensor
n_configurations = 10
configurations = selector.propose_configurations(new_dataset, n_configurations)

Development

Commands useful during development:

Seting env variables - export PYTHONPATH=(backtick)pwd(backtick)
Install dependencies - pip install -r requirements_dev.txt
To run unit tests - pytest
Check code quality - ./scripts/check_code.sh
Relase - python -m build && twine upload dist/*

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bin		bin
data/openml		data/openml
experiments_engine		experiments_engine
scripts		scripts
test		test
wsmf		wsmf
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WSMF

tl;dr

Experiment reproduction

Contents

Examples of usage

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WSMF

tl;dr

Experiment reproduction

Contents

Examples of usage

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages