SingleTaskGP versus MultiTaskGP #1771

madhavkrishnan94 · 2023-03-23T07:31:39Z

madhavkrishnan94
Mar 23, 2023

Issue description

Hello!

We are currently using botorch to train a multi-output GP model on our data. Let's say, the GP model is trying to fit the function f on our dataset [Y=f(X)], where Y is a 4-dimensional vector of the output, i.e., Y = [Y1, Y2, Y3, Y4]. Similarly, X is a 3-dimensional vector of the input, i.e., X = [X1, X2, X3].

We also note that the output data is correlated.

In this regard, we have two questions:

In the documentation, it says:

However, as single-task models, SingleTaskGP, FixedNoiseGP, and HeteroskedasticSingleTaskGP should be used only when the outputs are independent and all use the same training data. If outputs are independent and outputs have different training data, use the ModelListGP. When modeling correlations between outputs, use a multi-task model like MultiTaskGP.

We would like to know what "different training data" here actually means.

At the outset, we have a single-task multi-output problem and using SingleTaskGP makes more sense (going by the documentation). However, SingleTaskGP does not capture the correlations between outputs. As a result, we would like to check if the following approach is feasible-

We would like to use MultiTaskGP by predicting each output in a different task (hence 4 tasks). By doing this, we would be able to leverage upon the ability of MultiTaskGP to capture correlation across tasks, thereby capturing correlation between outputs.

We would like to hear the thoughts of the botorch community on these questions. Thanks!

Balandat · 2023-03-25T17:01:27Z

Balandat
Mar 25, 2023
Collaborator

We would like to know what "different training data" here actually means.

By that we mean that the observations are not of what is sometimes called a "block design", where each output is observed at the same location - i.e. if you evaluate X, you always get all of the Ys and not just some (which seems to be the case in your setting?).

We would like to use MultiTaskGP by predicting each output in a different task (hence 4 tasks). By doing this, we would be able to leverage upon the ability of MultiTaskGP to capture correlation across tasks, thereby capturing correlation between outputs.

Yes, that makes sense! Are your observations noisy? Or are they deterministic? If they are deterministic AND your data has the block design described above, there isn't actually any benefit from modeling this with a multi-task model (this is a property called autokrigeability).

0 replies

madhavkrishnan94 · 2023-03-26T16:35:14Z

madhavkrishnan94
Mar 26, 2023
Author

Thanks a lot for your reply.

In my case, the observations are deterministic. And they have the block design, i.e., for each X, all Y s are obtained.

So far, I have been using SingleTaskGP to infer the intrinsic noise.

0 replies

Balandat · 2023-03-26T16:41:29Z

Balandat
Mar 26, 2023
Collaborator

So far, I have been using SingleTaskGP to infer the intrinsic noise.

Hmm if the observations are deterministic, why infer a noise (that you know is zero)? You can instead use a FixedNoiseGP passing zero variance (under the hood we'll convert this to some small jitter value for numerical stability).

Again, if you have deterministic observations AND your observations are of a block design, then there is no need to use a multi-task model (that is computationally a lot more expensive).

0 replies

madhavkrishnan94 · 2023-03-28T07:44:48Z

madhavkrishnan94
Mar 28, 2023
Author

Thanks a lot for your inputs.

But how do you suggest we capture the correlation between our output quantities in this case, without using a MultiTaskGP?

0 replies

Balandat · 2023-03-28T14:09:22Z

Balandat
Mar 28, 2023
Collaborator

You won't get any benefit in this case from capturing the correlation due to the autokrigeability.

0 replies

madhavkrishnan94 · 2023-03-31T06:33:11Z

madhavkrishnan94
Mar 31, 2023
Author

Thank you.

And I wanted to clarify one more thing wrt the observations being deterministic as I believe I misunderstood the context of the term. We actually have noise in the Y data; it contains 3 repeats on the same X. So, I guess using SingleTaskGP is more suitable in inferring the intrinsic noise.

0 replies

esantorella · 2023-03-31T14:18:51Z

esantorella
Mar 31, 2023
Collaborator

Yes, if you have multiple results on the same X with different values of Y, you have noise. Assuming you don't know the level of noise, it would be appropriate to use SingleTaskGP or HeteroskedasticSingleTaskGP rather than FixedNoiseGP so that the model can infer the noise.

0 replies

Balandat · 2023-03-31T14:29:33Z

Balandat
Mar 31, 2023
Collaborator

And if there is indeed noise, then it may be beneficial to use a multi-task GP model (how large that benefit is will depend on the level of the noise / signal-to-noise ratio).

0 replies

madhavkrishnan94 · 2023-06-12T07:01:15Z

madhavkrishnan94
Jun 12, 2023
Author

Hello @Balandat ,

In continuation of the same problem, I was wondering if MultiTaskGP supports training with different dimensionality across tasks.

In the earlier query, I wanted to model each of the 4 outputs as a separate task. But currently, I am exploring an option where I group 3 outputs into 1 task and have the other output as another task.

I have defined the task indices using
i1, i2, = torch.zeros(total_points, 1), torch.ones(total_points, 1)

I then define the data and pass it to the model as:

X=train_x_normalized
Y=train_obj

a1=Y[:,0]
a2=torch.stack([Y[:,1],Y[:,2],Y[:,3]],-1)

train_X_MT = torch.cat([torch.cat([X, i1], -1), torch.cat([X, i2], -1)])
train_Y_MT = torch.cat([a1,a2]).unsqueeze(-1)

model_MT = MultiTaskGP(train_X_MT, train_Y_MT, task_feature=-1)

This gives me an error message which says: RuntimeError: Tensors must have same number of dimensions: got 1 and 2.

Any leads would be appreciated!

1 reply

Balandat Jun 19, 2023
Collaborator

Presumably this throws before even getting to the model when you do train_Y_MT = torch.cat([a1,a2]).unsqueeze(-1)? Here a1 and a2 are of incompatible shapes.

If you want to label those three tasks as a single task, then you'd have to stack things int the following way:

train_X_MT = torch.cat([torch.cat([X, i1], -1), *[torch.cat([X, i2], -1)) for _ in range(3)]])
train_Y_MT = torch.cat(Y.unbind(-1), dim=-1).unsqueeze(-1)

madhavkrishnan94 · 2023-06-28T06:51:09Z

madhavkrishnan94
Jun 28, 2023
Author

Thank you!

So, I see that you have now defined train_X_MT of size [total_points,4] and train_Y_MT of size [total_points,1].
When I train the model

model_MT = MultiTaskGP(train_X_MT, train_Y_MT, task_feature=-1)
mll_MT = ExactMarginalLogLikelihood(model_MT.likelihood, model_MT)
fit_gpytorch_model(mll_MT)

and try to see the predictions using

with torch.no_grad():
  pred_MT = model_MT.posterior(train_x_normalized).mean

print("The GPR predictions: \n", pred_MT)

I only see two outputs- which is one per task. But I would ideally like to see pred_MT show the predictions for each of the four output dimensions that we started off with.

Does multi-task prevent us from doing so?

3 replies

Balandat Jun 30, 2023
Collaborator

Looks like you're not passing in more than two task indices then? The model should give you predictions for all training tasks (if you don't specify output_tasks). Can you share a full repro of your code incl. data (possibly random)?

madhavkrishnan94 Jul 2, 2023
Author

I'm passing in two task indices only, through i1 and i2 in

train_X_MT = torch.cat([torch.cat([X, i1], -1), *[torch.cat([X, i2], -1)) for _ in range(3)]])
train_Y_MT = torch.cat(Y.unbind(-1), dim=-1).unsqueeze(-1)

But pred_MT that predicts the output of the GP model has only two dimensions.

And yes, I can share my code (and a generalized version of my data). Shall I just copy-paste here or do I paste a link to the code?

Balandat Jul 2, 2023
Collaborator

But pred_MT that predicts the output of the GP model has only two dimensions.

I am not sure I see the problem if you're only passing in two tasks - what am I missing?

And yes, I can share my code (and a generalized version of my data). Shall I just copy-paste here or do I paste a link to the code?

Either way works.

madhavkrishnan94 · 2023-07-13T09:40:07Z

madhavkrishnan94
Jul 13, 2023
Author

The data typically looks like this:

I've just shown a small but fair representation of how the dataset looks like, and marked the columns containing the inputs (Xi) and outputs (Yi). My problem statement requires me to minimize Y1, Y2, Y3 and maximize Y4. Since the default setting in botorch is maximization, I take the negative of Y1, Y2, Y3. Further, I also perform min-max normalization on the inputs to define a new input tensor (train_x_normalized in the code below). The output quantities (Y1 to Y4) are defined as a single tensor. Since Y1, Y2, Y3 are measured together and Y4 is measured separately, I wanted to have two tasks in the GP model. I am interested in the prediction of all four values.

Here's the relevant portion of the code:

X=train_x_normalized
Y=train_obj

a1=Y[:,0]
a2=torch.stack([Y[:,1],Y[:,2],Y[:,3]],-1)

train_X_MT = torch.cat([torch.cat([X, i1], -1), *[torch.cat([X, i2], -1) for _ in range(3)]])
train_Y_MT = torch.cat(Y.unbind(-1), dim=-1).unsqueeze(-1)

model_MT = MultiTaskGP(train_X_MT, train_Y_MT, task_feature=-1)
mll_MT = ExactMarginalLogLikelihood(model_MT.likelihood, model_MT)
fit_gpytorch_model(mll_MT)


with torch.no_grad():
  pred_MT = model_MT.posterior(train_x_normalized).mean

print("The GPR predictions: \n", pred_MT)

pred_MT contains just two dimensions, which I guess is one per task. But ideally, I would like to see the predictions for each of Y1, Y2, Y3, Y4 when I print pred_MT, as I am interested in the values of all four of them.

I hope this is clear enough. Please let me know if there's some other information/code that I may need to plug for you to be able to take a closer look.

Thanks!

2 replies

Balandat Jul 15, 2023
Collaborator

Hmm I think I see what you want to do now. Unfortunately, this won't work with the standard multi-task models. For starters, the outputscale is shared across observations of one task. But your Y1, Y2, Y3 are on vastly different scales. Also, you can see from glancing at the table that these outputs don't really seem to be highly correlated - e.g we have Y1[0] < Y1[1] < Y1[2] but Y2[0] < Y2[1] > Y2[2].

My recommendation here would be to (i) model the Ys independently, and (ii) try a multi-task model with 4 tasks, one for each Y. Then you can compare which of the model works better.

madhavkrishnan94 Jul 18, 2023
Author

Thanks for your reply.

Also, you can see from glancing at the table that these outputs don't really seem to be highly correlated

Actually, I ran a Pearson correlation and it turns out that Y1 and Y3 show a strong correlation with each other (~0.99). But the other pairs of correlation coefficients are not that high. Also, I'm not sure if you can compare values for the same Y across indices in the manner that you have illustrated, because each of the Y values is a result of 3 input (X) values.

My recommendation here would be to (i) model the Ys independently, and (ii) try a multi-task model with 4 tasks, one for each Y. Then you can compare which of the model works better.

Yes, these are some approaches I have thought about and partially tried in the past. I think I will take a fresh look at the 4-task model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SingleTaskGP versus MultiTaskGP #1771

{{title}}

Replies: 11 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

SingleTaskGP versus MultiTaskGP #1771

madhavkrishnan94 Mar 23, 2023

Issue description

Replies: 11 comments · 6 replies

Balandat Mar 25, 2023 Collaborator

madhavkrishnan94 Mar 26, 2023 Author

Balandat Mar 26, 2023 Collaborator

madhavkrishnan94 Mar 28, 2023 Author

Balandat Mar 28, 2023 Collaborator

madhavkrishnan94 Mar 31, 2023 Author

esantorella Mar 31, 2023 Collaborator

Balandat Mar 31, 2023 Collaborator

madhavkrishnan94 Jun 12, 2023 Author

Balandat Jun 19, 2023 Collaborator

madhavkrishnan94 Jun 28, 2023 Author

Balandat Jun 30, 2023 Collaborator

madhavkrishnan94 Jul 2, 2023 Author

Balandat Jul 2, 2023 Collaborator

madhavkrishnan94 Jul 13, 2023 Author

Balandat Jul 15, 2023 Collaborator

madhavkrishnan94 Jul 18, 2023 Author

madhavkrishnan94
Mar 23, 2023

Replies: 11 comments 6 replies

Balandat
Mar 25, 2023
Collaborator

madhavkrishnan94
Mar 26, 2023
Author

Balandat
Mar 26, 2023
Collaborator

madhavkrishnan94
Mar 28, 2023
Author

Balandat
Mar 28, 2023
Collaborator

madhavkrishnan94
Mar 31, 2023
Author

esantorella
Mar 31, 2023
Collaborator

Balandat
Mar 31, 2023
Collaborator

madhavkrishnan94
Jun 12, 2023
Author

Balandat Jun 19, 2023
Collaborator

madhavkrishnan94
Jun 28, 2023
Author

Balandat Jun 30, 2023
Collaborator

madhavkrishnan94 Jul 2, 2023
Author

Balandat Jul 2, 2023
Collaborator

madhavkrishnan94
Jul 13, 2023
Author

Balandat Jul 15, 2023
Collaborator

madhavkrishnan94 Jul 18, 2023
Author