BBOB Sampling & ELA Feature Extraction Pipeline

A scalable, end-to-end pipeline for:

Sampling continuous search spaces
Evaluating BBOB benchmark functions
Extracting ELA (Exploratory Landscape Analysis) features
Studying compression ratio effects on sampling on random subspaces
Building large datasets efficiently (parallel + chunked)

Full Pipeline Overview

flowchart LR
    A[Sample X] --> B["Evaluate f(X) on BBOB"]
    B --> C[Extract ELA Features]
    C --> D[Aggregate Dataset]

    subgraph Advanced ["Sampling on Subspaces Pipeline"]
        E[Sample in low dimension d]
        F[Project to high dimension D]
        G[Evaluate BBOB]
        H[ELA on full + slices]
    end

    E --> F --> G --> H

Project Structure

.
├── doe_sampling.py                      # Generate X samples
├── y_sampling.py                        # Evaluate BBOB functions
├── ela_sampling.py                      # Extract ELA features
├── sampler.py                           # Alternative IOH-based sampling
│
├── slicing_sampling_test_parallel.py
├── slicing_all_in_sampling_test_parallel.py
│   └── Low-D → High-D sampling + parallel ELA
│
├── parallel_loader.py                   # Build final dataset (chunked)
├── parallel_loader_slices.py
├── parallel_loader_slices_all_in.py
│   └── Parallel loading of many CSV files
│
└── data/                                # Outputs (generated)

Installation

Just run the following line in bash:

python3 -m pip install -r requirements.txt

Usage (End-to-End)

Generate Samples for Full-Space Sampling on BBOB

The following is an example to use any Quasi-Monte-Carlo sampling. Currently, the code allows to use halton, sobol or lhs to generate points to be then passed to one of the BBOB functions and get function evaluations for ulterior assessment.

Example

python doe_sampling.py \
    --dim 20 \
    --n 1000 \
    --sampler lhs \
    --seed 42 \
    --out samples.csv

Output

As an output, a folder is generated with the corresponding dimension, number of samples $n$, the utilized qmc-sampler, the random seed set as in:

x_samples/
  reduction/
    Dimension_20/
      seed_42/
        Samples_1000/
          samples.csv

Evaluation of BBOB Functions

Run:

python y_sampling.py

Where you need to open the file and select the folder with the already generated samples. By running the script correctly, then the following directories will be generated:

bbob_evaluations/
  reduction/
    Dimension_20/
      seed_42/
        Samples_1000/
          f_1/
            id_0/
              evaluations.csv

The script y_sampling.py has predefined to evaluate the 24 functions from the BBOB and the first 15 instances of each function.

Extract ELA features

Run:

python ela_sampling.py

Therein you have to indicate the directories with the samples and the corresponding evaluations.

Then the output is the following:

ela_features/
  reduction/
    Dimension_20/
      seed_42/
        Samples_1000/
          f_1/
            id_0/
              ela_features.csv

Each file contains:

feature_1, feature_2, ..., feature_n

Building Final Dataset

Run:

python parallel_loader.py

The latter script generates the files:

complete_data_generated.csv
complete_data_generated.parquet

Final dataset includes:

dimension
seed
n_samples
function_idx
instance_idx
source_file

Sampling on subspaces

Initial Sampling

Run:

python slicing_sampling_test_parallel.py

Or:

python slicing_all_in_sampling_test_parallel.py

The distinction between the first and the second one is the point density allocation as in the second all points are evaluated in one defined subspace, whereas the first one splits point density into multiple subspaces or "slices".

What Happens?

flowchart LR
    A[Low-D samples] --> B[Random embedding]
    B --> C[High-D samples]
    C --> D[Evaluate BBOB]
    D --> E["ELA (full dataset)"]

    C --> F[Split into slices]
    F --> G[ELA per slice]

Output Structure

By running either one of the aforementioned scripts, then the following structure will be generated. This is the renderization of the flowchart explained above.

sampling_outputs_20D_10D/
  f1/
    iid_0/
      group0/
        full.csv
        slice1.csv
        slice2.csv

Meaning

File	Description
`full.csv`	ELA features on all samples
`slice*.csv`	ELA features per low-D slice

Data Formats (Unified)

X Samples

x1, x2, ..., xd

Function Evaluations

fX

ELA Features

feature_1, feature_2, ..., feature_n

Final Dataset (Aggregated)

Includes:

Features
Metadata
File origin
Core Concepts:
Sampling Methods (Latin Hypercube (LHS), Sobol, Halton, Monte Carlo)
ELA Features (via pflacco) (Meta features, Distribution features ,Level sets, Nearest Better Clustering (NBC), Dispersion, Information content, PCA, Fitness-distance correlation
Projection Strategy (Sample in low dimension (d) --> Embed into high dimension (D) --> Evaluate in high-D

Details

Analyze landscape structure via ELA
Performance Features
Parallel processing (multiprocessing)
Chunked data loading
Memory-safe streaming
Parquet output (fast + compressed)
Handles millions of CSV files

Important Notes

Sobol sampling requires n = 2^k
ELA level features require enough samples
File paths encode metadata → do not change structure
Use Parquet for large datasets

Customization

You can modify:

Sampling method (get_sampler)
Dimensions (D, d)
Number of groups / slices
Enabled ELA features

Use Cases

Optimization landscape analysis
Meta-learning dataset generation
Benchmarking optimization algorithms
Studying dimensionality reduction effects

Summary

This repository provides a complete, scalable pipeline for:

✔ Sampling & benchmarking ✔ ELA feature extraction ✔ High-dimensional analysis ✔ Large-scale dataset construction

📜 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BBOB Sampling & ELA Feature Extraction Pipeline

Full Pipeline Overview

Project Structure

Installation

Usage (End-to-End)

Generate Samples for Full-Space Sampling on BBOB

Example

Output

Evaluation of BBOB Functions

Extract ELA features

Building Final Dataset

Sampling on subspaces

Initial Sampling

What Happens?

Output Structure

Meaning

Data Formats (Unified)

Final Dataset (Aggregated)

Details

Important Notes

Customization

Use Cases

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Plotting Scripts		Plotting Scripts
qmc_samplers		qmc_samplers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ela_sampling.py		ela_sampling.py
parallel_loader.py		parallel_loader.py
parallel_loader_slices.py		parallel_loader_slices.py
parallel_loader_slices_all_in.py		parallel_loader_slices_all_in.py
requirements.txt		requirements.txt
sampler.py		sampler.py
slicing_all_in_sampling_test_parallel.py		slicing_all_in_sampling_test_parallel.py
slicing_sampling_test_parallel.py		slicing_sampling_test_parallel.py
y_sampling.py		y_sampling.py

Folders and files

Latest commit

History

Repository files navigation

BBOB Sampling & ELA Feature Extraction Pipeline

Full Pipeline Overview

Project Structure

Installation

Usage (End-to-End)

Generate Samples for Full-Space Sampling on BBOB

Example

Output

Evaluation of BBOB Functions

Extract ELA features

Building Final Dataset

Sampling on subspaces

Initial Sampling

What Happens?

Output Structure

Meaning

Data Formats (Unified)

Final Dataset (Aggregated)

Details

Important Notes

Customization

Use Cases

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages