Golem Class¶
The main class Golem allows to estimate the robust merits of a set of parameters, as well as the
robust objective function, based on a set of observations/samples as the training set. This is achieved via an interface
similar to that of scikit-learn, where the two main methods are fit and predict.
First, we instantiate the Golem class:
golem = Golem(ntrees=1, goal='min', nproc=1)
Assuming we have a set of parameters X and their corresponding objective function evaluations y, we can fit the
tree-based model used by Golem:
golem.fit(X, y)
We can now use Golem to estimate the robust merits for any set of input parameters X_pred, given known/assumed
probability distributions representing the uncertainty of each input variable. For instance, if we have a 2-dimensional
input space, where the first variable has normally-distributed uncertainty, and the second one has uniform uncertainty:
golem.predict(X_pred, distributions=[Normal(0.1), Uniform(0.5)])
For a complete example on how to use the Golem class, see the Basic Usage example.
API Reference¶
-
class
golem.Golem(forest_type='dt', ntrees=1, goal='min', nproc=None, random_state=None, verbose=True)[source] - Parameters
forest_type (str) – Type of forest. Options are
dtfor decision (regression) trees,rffor random forest,etfor extremely randomized trees,gbfor gradient boosting. Default isdt.ntrees (int, str) – Number of trees to use. Use 1 for a single regression tree, or more for a forest. If
1is selected, the choice offorest_typewill be discarded and a single regression tree will be used.nproc (int) – Number of processors to use. If not specified, all but one available processors will be used. Each processor will process a different tree; therefore there is no benefit in using
nproc>ntrees.goal (str) – The optimization goal, “min” for minimization and “max” for maximization. This is used only by the methods
recommendandget_merit.random_state (int, optional) – Fix random seed.
verbose (bool, optional.) – Whether to print information to screen. If
Falseonly warnings and errors will be displayed.
- Variables
y_robust (array) – Expectation of the merits under the specified uncertainties, \(E[f(x)]\).
y_robust_std (array) – Uncertainty in the expectation, estimated as standard deviation (\(\sigma\)) from the variance across trees, \(\sigma [E[f(X)]]\).
std_robust (array) – Standard deviation of the merits under the specified uncertainties, \(\sigma [f(x)]\).
std_robust_std (array) – Uncertainty in the standard deviation, estimated as standard deviation (\(\sigma\)) from the variance across trees, \(\sigma [\sigma [f(x)]]\).
forest (object) –
sklearnobject for the chosen ensemble regressor.
Methods
fit(X, y)Fit the tree-based model to partition the input space.
predict(X, distributions[, return_std, …])Predict the robust merit for all samples in
Xgiven the specified uncertainty distributions.get_merits([beta, normalize])Retrieve the values of the robust merits.
get_tiles([tree_number])Returns information about the tessellation created by the decision tree.
set_param_space(param_space)Define the parameter space (the domain) of the optimization.
recommend(X, y, distributions[, xi, …])WARNING: This is an experimental method, use at own risk.-
fit(X, y)[source] Fit the tree-based model to partition the input space.
- Parameters
X (array, list, pd.DataFrame) – Array, list, or DataFrame containing the location of the inputs. It follows the
sklearnformat used for features: each row \(i\) is a different sample in \(X_{ij}\), and each column \(j\) is a different feature. If the parameters contain categorical variables, please provide a DataFrame.y (array, list, pd.DataFrame) – Observed responses for the inputs
X.
-
get_merits(beta=0, normalize=False)[source] Retrieve the values of the robust merits. If
betais zero, what is returned is equivalent to the attributey_robust. Ifbeta > 0then a multi-objective merit is constructed by considering both the expectation and standard deviation of the output.- Parameters
beta (int, optional) – Parameter that tunes the penalty variance, similarly to a upper/lower confidence bound acquisition. Default is zero, i.e. no variance penalty. Higher values favour more reproducible results at the expense of total output.
normalize (bool, optional) – Whether to return normalized values between 0 and 1.
- Returns
merits – Values of the robust merits.
- Return type
array
-
get_tiles(tree_number=0)[source] Returns information about the tessellation created by the decision tree.
- Parameters
tree_number (int) – The index of the tree to parse. Default is 0, i.e. the first tree.
- Returns
tiles – List of tiles with information about the lower/upper boundary of the tile in all dimensions, and the predicted output by the decision tree model.
- Return type
list
-
predict(X, distributions, return_std=False, return_unc=False)[source] Predict the robust merit for all samples in
Xgiven the specified uncertainty distributions.- Parameters
X (np.array, pd.DataFrame) – Array or DataFrame containing the input locations for which to predict their robust merit. Provide the same input X you passed to the
fitmethod if you want to reweight the merit of the samples.distributions (array, dict) – Array or dictionary of distribution objects from the
distsmodule.return_std (bool) – Whether to return an estimate of the standard deviation of the output, \(\sqrt{Var[f(X)]} = \sigma[f(X)]\), in addition to the expectation, \(E[f(X)]\).
return_unc (bool) – Whether to return an estimate of the uncertainty for the output expectation (and standard deviation if
return_std=True). The uncertainty is computed simply as the standard deviation across the estimates obtained from all trees in the forest. It thus reports on the discrepancy of estimates between individual regressors. Ifreturn_std=Trueandreturn_unc=True, the method will return (\(E[f(X)]\), \(\sigma[f(X)]\), \(\sigma[E[f(X)]]\), \(\sigma[\sigma[f(X)]]\)). Ifreturn_std=Falseandreturn_unc=True, the method will return (\(E[f(X)]\), \(\sigma[E[f(X)]]\)).
-
recommend(X, y, distributions, xi=0.1, pop_size=1000, ngen=10, cxpb=0.5, mutpb=0.3, verbose=False)[source] WARNING: This is an experimental method, use at own risk.Recommend next query location for the robust optimization.
- Parameters
X (array, list, pd.DataFrame) – Input parameters for all past observations.
y (array, list, pd.DataFrame) – Measurements/outputs corresponding for all parameters in
X.distributions (list) – List of
golemdistribution objects representing the uncertainty about the location of the input parameters.xi (float) – Trade-off parameter of Expected Improvement criterion. The larger it is the more exploration will be favoured.
pop_size (int) – Population size for the Genetic Algorithm based optimization of the acquisition function.
ngen (int) – Number of generations to use in the GA optimization of the acquisition function.
cxpb (float) – Probability of cross-over for the GA.
mutpb (float) – Probability of mutation for the GA.
verbose (bool) – Whether to print information about the GA progress.
- Returns
X_next – List with suggested parameters for the next location to query.
- Return type
list
-
set_param_space(param_space)[source] Define the parameter space (the domain) of the optimization. This is needed only to use the experimental
recommendmethod.- Parameters
param_space (list) – List of dictionaries containing information on each input variable. Each dictionary should contain the key “type”, which can take the value “continuous”, “discrete”, or “categorical”. Continuous and discrete variables should also contain the keys “low” and “high” that set the bounds of the domain. Categorical variables should contain the key “categories” with a list of the categories.
Examples
>>> golem = Golem() >>> var1 = {"type": "continuous", "low": 1.5, "high": 5.5} >>> var2 = {"type": "discrete", "low": 0, "high": 10} >>> var3 = {"type": "categorical", "categories": ["red", "blue", "green"]} >>> param_space = [var1, var2, var3] >>> golem.set_param_space(param_space)