Transfer learning #1307

jduerholt · 2022-07-05T12:40:42Z

jduerholt
Jul 5, 2022

Hi botorch developers,

I was wondering what is the best way to realize the transfer learning scenario, depicted in the example below, in botorch (a more detailed discussion can be found in https://arxiv.org/abs/1711.05099). In this case, the final target is to optimize property A in a BO scenario. In addition, predicitve models for latent properties B and C are available and properties B and C are helpful features in infering target property A.

The workflow would be to first predict B and C, append it to the original features and feed them into a GP to predict property A. What would be the best way to do this in botorch?

My idea would be to implement a new InputTransform called PredictiveInputTransform which gets in its __init__ method an instantiated torch based model which is then executed in the transform method and appends the predicted property to the initial X matrix. Using the ChainedInputTransform several of this PredictiveInputTransforms can be chained after each other to realize the depicted scenario.

What do you think?

Best,

Johannes

Balandat · 2022-07-06T14:47:55Z

Balandat
Jul 6, 2022
Collaborator

I think this should work (@saitcakmak knows more about the ins and outs of dimension-altering transforms though). Another alternative way one could do this might be via a custom Kernel that internally computes the predictions B and C and then computes a kernel on that augmented feature set. Though the transform approach seems less invasive.

How expensive are the predictions of B and C? If these aren't super cheap then one thing to think about is whether you want to spend the cost of repeatedly predicting the same things during the model training. An alternative could be to just bulk-predict B and C for all the training data, then fit a standard model on that full augmented feature set, and then use a simple adapter for evaluating that model that augments the features outside the model right before calling posterior.

One question I have is whether there is uncertainty associated with the predictions B and C, or whether these are deterministic. In the former setting one may want to think a bit more about how to properly propagate that uncertainty in the predictions as they now enter as features (this is related to robust BO, something also in @saitcakmak's territory).

0 replies

jduerholt · 2022-07-06T16:09:19Z

jduerholt
Jul 6, 2022
Author

Hi Max,

thanks for your quick response.

Concerning your idea of bulk prediction, this should be definitely possible for pure predictions. But when performing an actual optimization the proposed adapter also has to be available in the acquisition function, or? Because, in the optimization which is just performed on the original features, B and C are not known and computed at every optimization step of optimize_acqf.

Concerning uncertainties: for our use case both is possible: deterministic models or models with uncertainty like GPs. For the original implementation, we wanted to neglect the uncertainty of the latent space models, since we did not know how to incorporate them. But if you have an idea, we are very open to it.

Best,

Johannes

0 replies

Balandat · 2022-07-06T16:35:00Z

Balandat
Jul 6, 2022
Collaborator

Concerning your idea of bulk prediction, this should be definitely possible for pure predictions. But when performing an actual optimization the proposed adapter also has to be available in the acquisition function, or? Because, in the optimization which is just performed on the original features, B and C are not known and computed at every optimization step of optimize_acqf.

Correct, you'd still have to do this for acquisition function optimization. So if that's the bulk of the compute then there isn't really a big benefit to doing the bulk predictions.

Concerning uncertainties: for our use case both is possible: deterministic models or models with uncertainty like GPs. For the original implementation, we wanted to neglect the uncertainty of the latent space models, since we did not know how to incorporate them. But if you have an idea, we are very open to it.

We've done similar things in the context of robust optimization. A reasonably straightforward way would be to do this via MC sampling - basically rather than using the mean prediction draw a number of samples from the B/C model posteriors, use those as inputs, and then marginalize across these samples after computing the prediction/acquisition function on the sample level. See this tutorial for a related example where the perturbations come from some known noise level of the inputs.

Unless the uncertainty of these predictions is quite large, it's probably best to start with the simpler setup though and then go from there if it turns out that propagating this uncertainty is important.

0 replies

saitcakmak · 2022-07-12T22:47:25Z

saitcakmak
Jul 12, 2022
Collaborator

Hi @jduerholt. Sorry about the late response here! I think extending the AppendFeatures transform to use predictions of Model B/C instead of a pre-specified perturbation set should achieve what you're after. I did something similar for InputPerturbation in #1088 to support the heteroscedastic perturbations. In your case, the callable would return Model B/C predictions.

For handling uncertainty in Model B/C, you can probably leverage the objective argument (along with MC samples) just like we do for risk measures. My first attempt here would be using the Expectation risk measure for marginalization (should work great if you extend AppendFeatures for the first part), though whether expectation is the best way to marginalize in your case is an open question.

0 replies

jduerholt · 2022-07-13T05:54:10Z

jduerholt
Jul 13, 2022
Author

Hi @saitcakmak, thanks for your response. I will give a try in the next month and tell you how it was going ;)

0 replies

jduerholt · 2022-07-13T06:16:07Z

jduerholt
Jul 13, 2022
Author

Hi @saitcakmak, I just looked at the AppendFeatures and have another question: If I just want to pass in the additional models and do not want to consider uncetainties, is it necessary to expand also the q-batch dimension? For my understanding, the expansion should only happen on the feature dimension, or do I am overlooking something?

0 replies

saitcakmak · 2022-07-13T16:35:15Z

saitcakmak
Jul 13, 2022
Collaborator

If I just want to pass in the additional models and do not want to consider uncertainties, is it necessary to expand also the q-batch dimension?

The q-batch dimension is only expanded if n_f > 1. If you use a single feature, i.e. n_f = 1, then it shouldn't change the size of q-batch dimension. I think this should work well in the case of deterministic Model B/C (also allow you to try out the case with uncertainties easily later on).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning #1307

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Transfer learning #1307

jduerholt Jul 5, 2022

Replies: 7 comments

Balandat Jul 6, 2022 Collaborator

jduerholt Jul 6, 2022 Author

Balandat Jul 6, 2022 Collaborator

saitcakmak Jul 12, 2022 Collaborator

jduerholt Jul 13, 2022 Author

jduerholt Jul 13, 2022 Author

saitcakmak Jul 13, 2022 Collaborator

jduerholt
Jul 5, 2022

Balandat
Jul 6, 2022
Collaborator

jduerholt
Jul 6, 2022
Author

Balandat
Jul 6, 2022
Collaborator

saitcakmak
Jul 12, 2022
Collaborator

jduerholt
Jul 13, 2022
Author

jduerholt
Jul 13, 2022
Author

saitcakmak
Jul 13, 2022
Collaborator