This repository contains the code implementations for Bayesian Optimization of Function Networks with Partial Evaluations (pKGFN) and its accelerated version (fast_pKGFN).
The pKGFN algorithm is detailed in the paper "Bayesian Optimization of Function Networks with Partial Evaluations," accepted at ICML2024[1]. The accelerated version is described in "Fast Bayesian Optimization of Function Networks with Partial Evaluations," accepted at AutoML2025 and also available on ArXiv[2].
The entire codebase is written in python. Package requirements are as follows:
- python=3.9
- botorch==0.8.4
- numpy==1.23.5
- gpytorch==1.10
- scipy==1.10.1
- pandas
- matplotlib
- jupyter
The corresponding environment may be created via conda and the provided pKGFN_evn.yml file as follows:
conda env create -f pKGFN_env.yml
conda activate pKGFN_env
Bayesian Optimization (BO) [3,4] is an optimization framework used to solve problems of the form
where
BO framework consists of two main components
- Surrogate model: used for approximating the objective function
$$f(x)$$ and - Acquisition function: constucted upon the fitted surrogate model and used for evaluating benefits of performing an additional evaluation at any input
$$x\in\mathcal{X}$$ .
BO begins with an initial set of
Bayesian Optimization of Function Networks (BOFN) [5] is an advanced BO framework designed to solve optimization problems whose objective functions can be constructed as a network of functions such that outputs of some nodes serve as parts of inputs for another.
For example, Figure 1 shows a function network arranged as a directed acyclic graph (DAG), consisting of three function nodes
In [5], the surrogate model for a function network and a novel acquisition function named the Expected Improvement of Function Networks (EIFN) which leverages these intermediate outputs were proposed and the framework has shown significant optimization performance improvement.
Recently, [1] has extended the BOFN framework to function networks where nodes can be queried independently and they incur different positive evaluation costs. Using Figure 1 as example, in this setting, one can decide to evaluate
However, in many applications, such as manufacturing problems, this upstream output requirement is not necessary. For example, if Figure 1 represents a workflow in a manufacturing process where each function represents an intermediate step, then to evaluate the final process
- partial_kgfn consists of the following folders and files:
- acquisition -- acquisition function files
- FN_realization.py -- an AcquisitionFunction class used to sample a network realization from a function network model
- full_kgfn.py -- an MCAcquisitionFunction class used to compute the knowledge gradient for function network acquisition function with full evaluations
- partial_kgfn.py -- an MCAcquisitionFunction class used to compute the knowledge gradient for function network acquisition function with partial evaluations
- tsfn.py -- an AcquisitionFunction class used to compute the Thompson Sampling acquisition function
- experiments -- runner files for the two test case problems
- ackleyS_runner.py -- a main file to run Ackley problem
- ackmat_runner.py -- a main file to run AckMat problem
- freesolv3_runner.py -- a main file to run FreeSolv3 problem
- GPs1_runner.py -- a main file to run GP test problem #1
- GPs2_runner.py -- a main file to run GP test problem #2
- manufacturing_runner.py -- a main file to run Manu problem
- model -- gaussian process model for function network
- dag.py -- a DAG object
- decoupled_gp_network.py -- a model class for function network
- optim -- codes to support acquisition function optimization
- discrete_kgfn_optim.py -- a file containing optimization function used to solve partial_kgfn acquisition function
- test_functions -- test problems
- ack_mat.py -- a SyntheticTestFunction class for AckMat problem
- ackley_sin.py -- a SyntheticTestFunction class for Ackley problem
- freesolv3.py -- a SyntheticTestFunction class for FreeSolv3 problem
- GPs1.py -- a SyntheticTestFunction class for GP test problem #1
- GPs2.py -- a SyntheticTestFunction class for GP test problem #2
- manufacter_gp.py -- a SyntheticTestFunction class for manufacturing problem
- pharmaceutical.py -- a SyntheticTestFunction class for pharma problem
- freesolv_NN_rep3dim.csv -- a data file to construct a FreeSolv problem
- utils -- utilities functions
- construct_obs_set.py -- code for constructing observation set according to the DAG of the problem
- EIFN_optimize_acqf.py -- code for optimizing EIFN acquisition function
- gen_batch_x_fantasies.py -- code for generate X fantasies for discrete acquisition functions including fullKGFN and partialKGFN
- posterior_mean.py -- code for computing posterior mean of the network final node's output
- run_one_trial.py -- code for running one trial of Bayesian Optimization
- results folder is created to store saved models and optimization results.
- Visualization folder contains three files used to generate BO progress curves and to report runtimes:
- utils_decoupled_kgfn.py - codes that loads all the results and compute mean and standard errors of BO progress curve.
- read_results_and_plot_graphs.ipynb - a notebook that calls a load function from utils_decoupled_kgfn.py and plots the progress curve.
- read_wallclock.ipynb - a notebook to load results and report the average runtimes for all algorithms.
run_experiment.ipynb is a notebook used to run a problem.
-
First cell: Call a problem runner
-
Second cell: Call the loaded problem class and run a BO trial. The following attributes are required to be specified:
- trial -- trial number (int)
- algo -- algorithm name (str): options are "EI", "KG", "Random", "EIFN", "KGFN", "TSFN", "pKGFN", "fast_pKGFN" (Our method)
- cost -- evaluation cost configuration (str): This should be in the format of "node1cost_node2cost_node3cost_..._nodeKcost"
- budget -- BO evaluation budget (int)
- noisy -- a boolean variable indicating if the observations are noisy.
- impose_assump -- a boolean variable indicating if the upstream-downstream restriction is imposed. If "True", to evaluate downstream nodes, its parent nodes' outputs have to be obtained beforehand. To use fast_pKGFN, impose_condition is needed to set to False.
Note: fast_pKGFN can be used to solve all problems, but pharma due to its incompatible structure, making fast_pKGFN boil down to a standard EIFN in this specific problem
[1] Buathong, Poompol, et al. "Bayesian Optimization of Function Networks with Partial Evaluations." International Conference on Machine Learning. PMLR, 2024.
[2] Buathong, Poompol, and Peter I. Frazier. "Fast Bayesian Optimization of Function Networks with Partial Evaluations." arXiv preprint arXiv:2506.11456 (2025).
[3] Jones, Donald R., Matthias Schonlau, and William J. Welch. "Efficient global optimization of expensive black-box functions." Journal of Global optimization 13 (1998): 455-492.
[4] Frazier, Peter I. "Bayesian optimization." Recent advances in optimization and modeling of contemporary problems. Informs, 2018. 255-278.
[5] Astudillo, Raul, and Frazier, Peter I. "Bayesian optimization of function networks." Advances in neural information processing systems 34 (2021): 14463-14475.

