Skip to content

Chapter 4: Standard Modules

Pablo de Oliveira Castro edited this page Apr 7, 2015 · 3 revisions

Standard Modules

Introduction

This chapter presents the options for the standard modules bundled in ASK.

Bootstrap Modules

Bootstrap modules select the first batch of sampled points in the experiment.

Module latinsquare

Module latinsquare selects points using a Latin Hypercube sampling. The method is described in Large sample properties of simulations using Latin Hypercube Sampling, M. Stein, Technometrics 1987, 143–151, JSTOR. ASK’s module uses the R lhs package implementation.

Parameters Mandatory Expects Description
n yes integer Number of samples
method string: “random”, “genetic” or “maximin” Method to generate the latin hypercube, refer to lhs documentation for details. “random” method is default.
seed integer Seed to initialize the Random Number Generator (RNG). When missing the default R RNG initialization is used.

Module lowdiscrepancy

Module lowdiscrepancy selects points using a Low Discrepancy sequence. ASK’s module uses the R fOptions package implementation.

Parameters Mandatory Expects Description
n yes integer Number of samples
method string: “sobol” or “halton” Type of low discrepancy sequence, refer to fOptions documentation for details. “sobol” method is default.
seed integer Seed to initialize the RNG. When missing the default R RNG initialization is used.

Module random

Module random selects points by choosing a value for each factor using a uniform random distribution.

Parameters Mandatory Expects Description
n yes integer
seed integer Seed to initialize the RNG. When missing the default Python RNG initialization is used.

Module random-file

/ Module random-file selects points by choosing samples at random from a samples file. The file must be in ASK's data exchange format.

Parameters Mandatory Expects Description
n yes integer Number of samples
data_file yes path samples file
seed integer Seed to initialize the RNG. When missing the default Python RNG initialization is used.

Source Modules

Source modules measure the samples selected by a bootstrap or sampler module. Usually, the user writes a custom source module fitting the target experiment.

Module File

The module file uses a text file as a database for returning measures. When the file module receives a requested set of samples, it tries to find them in the database file and returns the matching ones.

Parameters Mandatory Expects Description
data_file yes path Path of the database file containing previously measured samples in ASK's data exchange format.

Sampler Modules

The sampler module samples additional points in each iteration of the experimental pipeline.

Module amart

Module amart selects points using the method described in Accurate and efficient processor performance prediction via regression tree based modeling, B. Li and L. Peng and B. Ramadass, Journal of Systems Architecture 2009, 457–467.

Parameters Mandatory Expects Description
n yes integer Number of samples
trees integer Number of trees used by the underlying GBM model. Default is 3000.
seeds integer Size of the committee.
predict_on_all boolean If true, the committee votes on all the candidate points. If false, the committee votes on a subsample of 20*n points using a Latin Hypercube sampling. Default is true.

Module hierarchical

Module hierarchical selects points using the Hierarchical Variance Sampling method.

Parameters Mandatory Expects Description
n yes integer Number of samples
confidence float Confidence bound adjustment. A value of 0.9 means that the interval computed is valid 90% of the time. Default is 0.9.
use_cov boolean If true, the error per region is computed from the square of the coefficient of variance, also called relative variance. By default, it is false and the error is computed from the variance.
use_weights boolean If true, a weights file is produced. It can be used by other modules (such as GBM) to compensate HVS uneven sampling during model construction. By default, it is true.
ponderate_by_size boolean If true, the error per region is defined as the square of the coefficient of variance multiplied by the region size. By default, it is true.

Module tgp

Module tgp selects points using the Tree Gaussian Process R package.

Parameters Mandatory Expects Description
n yes integer Number of samples

Module latinsquare

Module latinsquare selects points using augmented Latin Hypercube samplings using the lhs R package. If latinsquare is used as sampler, then latinsquare must be used as bootstrap.

Parameters Mandatory Expects Description
n yes integer Number of samples

Module random

The random module selects points by choosing a value for each factor using a uniform random distribution. It also avoids selecting already sampled points. When selecting an already sampled point, it discards the point and chooses a new random combination. The module finally gives up if, after 50 tries, it is unable to find a new combination,

Parameters Mandatory Expects Description
n yes integer Number of samples

Model Modules

Model modules predict the response in locations not yet sampled.

Module cart

Module cart uses the model described in CART: Classification and regression trees, L. Breiman and J. Friedman and R. Olshen and C. Stone and D. Steinberg and P. Colla, Wadsworth 1983.

Parameters Mandatory Expects Description
cp yes float Complexity parameter

Module gbm

Module gbm uses the model described in Generalized Boosted Models: A guide to the gbm package, G. Ridgeway.

Parameters Mandatory Expects Description
ntreees integer Number of trees, default is 3000.
interactiondepth integer Interaction Depth, default is 8.
shrinkage float Shrinkage, default is 0.01
distribution string: “gaussian”, “laplace”, for the full list check gbm documentation. Distribution function used when building the boosted tree model. Gaussian, the default minimizes RMSE error. Laplace minimizes the mean absolute error.

Module tgp

Module tgp uses the Tree Gaussian Process model implemented in the tgp R package.

ASK’s interface to the tgp module is not yet configurable.

Control Modules

Control modules choose when to end the experiment.

Module points

Module points, stops the experiment after a fixed number of samples.

Parameters Mandatory Expects Description
n yes integer Number of samples. The experiment will be stopped when the number of points sampled reaches n.

Module convergence

The module converge, stops the experiment if the model does not improve significantly in a given time window. It computes the improvement on a measure of the model error. The accuracy values are read from a time series file. The timeseries file, which is briefly discussed in Chapter 3, is composed of space separated values. The first line is a header with the name of the columns. The first column contains the number of samples, the second column contain a measure of error. The file is automatically generated by the generic reporter, for example:

samples mean-error max-error rmse mean-relative max-relative
50 0.0377823384121497 0.447323111968324 0.0781095183475204 Inf Inf
100 0.0294159657374007 0.408494500075275 0.0630672408986823 Inf Inf
150 0.0275319895231202 0.281661273924844 0.0469645301362225 Inf Inf
Parameters Mandatory Expects Description
timeseries path A file containing the model’s error timeseries. If left empty, the module tries to retrieve the timeseries file from the report module configuration.
window int The window in number of iterations. The default value is 5.
threshold float Improvement threshold. A threshold of 0.1, means that if the error rate of change in the last window iterations is under 1%, the experiment stops.

Reporter Modules

Reporter modules produce informative statistics for the user during the experiment.

Module generic

Module points, stops the experiment after a fixed number of samples.

Parameters Mandatory Expects Description
test_set yes path File containing a set of samples. The model error is measured against this test-set of samples. The file must be in ASK's data exchange format.
script yes path: “reporter/generic/1D.R” or “reporter/generic/2D.R” or user provided script This is the path of the script used for plotting the model’s predictions. This module includes generic scripts for 1D and 2D design spaces, that is to say composed of one or two factors. For higher-dimension spaces, the user must .
timeseries path Path of the time series file. Error measures of the model will be written to this file.
max_error_scale float If specified, it represents the error’s plot maximal error value. If left empty, the error’s plot range is dynamically adapted.