[Bug] Implicit assumption of double precision can cause failures when single precision is used #2596

AVHopp · 2024-10-28T13:30:38Z

🐛 Bug

Within test_functions/base.py, the bounds are hard-coded to double precision. This makes it impossible to use single-precision which is necessary for e.g. MPS support on Mac.

To reproduce

import torch
from botorch.test_functions import Rastrigin

torch.set_default_device("mps")
torch.set_default_dtype(torch.float32)

test = Rastrigin()

This yields the following error message (paths shortened for better readability):

Traceback (most recent call last):
  File "[...]/test.py", line 7, in <module>
    test = Rastrigin()
  File "[...]/lib/python3.10/site-packages/botorch/test_functions/synthetic.py", line 664, in __init__
    super().__init__(noise_std=noise_std, negate=negate, bounds=bounds)
  File "[...]/lib/python3.10/site-packages/botorch/test_functions/synthetic.py", line 83, in __init__
    super().__init__(noise_std=noise_std, negate=negate)
  File "[...]/lib/python3.10/site-packages/botorch/test_functions/base.py", line 51, in __init__
    "bounds", torch.tensor(self._bounds, dtype=torch.double).transpose(-1, -2)
  File "[...]/lib/python3.10/site-packages/torch/utils/_device.py", line 79, in __torch_function__
    return func(*args, **kwargs)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Expected Behavior

The initialization of the BaseTestProblem class should not enforce double precision in the buffer for the bounds but probably use torch.get_default_dtype()

System information

Please complete the following information:

BoTorch Version 0.12.0
PyTorch Version 2.4.1
MacOS

Additional context

These seem to be the problematic lines:

botorch/botorch/test_functions/base.py

Lines 49 to 51 in 9d37e90

    
           self.register_buffer( 
        
               "bounds", torch.tensor(self._bounds, dtype=torch.double).transpose(-1, -2) 
        
           )

If the error is in fact just this one line of code, I'd be more than happy to create a mini Pull Request if it makes sense :)

The text was updated successfully, but these errors were encountered:

AVHopp · 2024-10-28T13:32:03Z

NOTE: I have verified locally that changing the line of code that I mentioned under "Additional context" solves the problem.

Balandat · 2024-10-28T13:41:54Z

The initialization of the BaseTestProblem class should not enforce double precision in the buffer for the bounds but probably use torch.get_default_dtype()

Yep, that makes sense. If self._bounds are python floats then this will automatically happen, but I guess some of these might be defined as int, so using torch.get_default_dtype() explicitly here seems good so we don't end up with int tensors.

If the error is in fact just this one line of code, I'd be more than happy to create a mini Pull Request if it makes sense :)

That would be very welcome!

AVHopp · 2024-10-28T14:04:51Z

Great, then I'll draft the PR :)

Balandat · 2024-10-28T14:13:01Z

Great. Btw, have you been using BoTorch more generally with MPS on Mac? I have tried this in the past but a lot of operations that we (and gpytorch) use under the hood weren't supported by MPS at the time, curious if that has changed more recently.

AVHopp · 2024-10-28T14:22:32Z

I am currently trying to bring GPU support to a package that I co-develop (happy to share the link if shameless self-advertisement is fine 😆 ) that uses BoTorch in the backend. This issue here was the first that I encountered, and I haven't investigated in more detail until now.
I can however share whatever I find here or at any other place in case that I find more issues/errors if you are interested 😄

AVHopp · 2024-10-28T15:38:47Z

@Balandat I found another occurrence of the same issue:

import numpy as np
import torch
from botorch.optim.utils.numpy_utils import set_tensors_from_ndarray_1d
from torch import tensor

torch.set_default_device("mps")
tensors = (tensor([-2.2532]),)

array = np.asarray([-2.25321913], dtype=np.float64)

set_tensors_from_ndarray_1d(tensors, array)

This fails since the to_tensor call in the function does not take the default dtype into account:

botorch/botorch/optim/utils/numpy_utils.py

Lines 113 to 128 in 9d37e90

    
           def set_tensors_from_ndarray_1d( 
        
               tensors: Iterator[Tensor] | dict[str, Tensor], 
        
               array: npt.NDArray, 
        
               as_tensor: Callable[[npt.NDArray], Tensor] = torch.as_tensor, 
        
           ) -> None: 
        
               r"""Sets the values of one more tensors based off of a vector of assignments.""" 
        
               named_tensors_iter = ( 
        
                   iter(tensors.items()) if isinstance(tensors, dict) else enumerate(tensors) 
        
               ) 
        
               with torch.no_grad(): 
        
                   index = 0 
        
                   for name, tnsr in named_tensors_iter: 
        
                       try: 
        
                           size = tnsr.numel() 
        
                           vals = array[index : index + size] if tnsr.ndim else array[index] 
        
                           tnsr.copy_(as_tensor(vals).to(tnsr).view(tnsr.shape).to(tnsr))

The error happens in the last line, and replacing this line either by torch.as_tensor(...) or the default value to a whacky lambda expression as_tensor: Callable[[ndarray], Tensor] = lambda x: torch.as_tensor(x, dtype=torch.get_default_dtype()) seems to solve the issue (at least on the BoTorich version I pointed out here).

It seems like there are several places where there is an implicit assumption on using double precision/not the default one if something is enforced. Should I just create small PRs whenever I find these or is there a better way to go forward?

Balandat · 2024-10-28T15:56:53Z

It seems like there are several places where there is an implicit assumption on using double precision/not the default one if something is enforced.

The main reason this implicit assumption is present in various places in the codebase is that we've found that when working with GPs we often end up with rather ill-conditioned matrices, and performing things like matrix decompositions or linear solves on those with FP32 precision can be quite hairy and result in poor accuracy and often also outright failures.

Should I just create small PRs whenever I find these or is there a better way to go forward?

That would be great and I think it makes sense to fix those as they come up.

Balandat · 2024-10-28T16:04:24Z

Actually, regarding the issue about set_tensors_from_ndarray_1d: we did change this in the past, IIRC in response to some flaky tests: https://github.com/pytorch/botorch/pull/1508/files - That doesn't mean the current setup is the right one though. cc @esantorella

The error happens in the last line, and replacing this line either by torch.as_tensor(...) or the default value to a whacky lambda expression as_tensor: Callable[[ndarray], Tensor] = lambda x: torch.as_tensor(x, dtype=torch.get_default_dtype()) seems to solve the issue (at least on the BoTorich version I pointed out here).

Looking at the code I don't actually ever see the as_tensor arg of NdarrayOptimizationClosure be used; seems this is always the default. Given that I would probably recommend just getting rid of that additional complexity and simply use torch.as_tensor() in the body of the function, and use the device and dtype of the input tensors as the arguments to it.

esantorella · 2024-10-28T20:17:28Z

Regarding the first issue, with the bounds in test functions, why use torch.get_default_dtype? Could we, say, pass a dtype argument to the function that defaults to torch.float64? The proposal to use torch.get_default_dtype will change the behavior and could potentially break some existing code, since torch.get_default_dtype is torch.float32 when not otherwise set.

esantorella · 2024-10-28T20:32:15Z

Regarding set_tensors_from_ndarray_1d, that cast to double-precision was added along with this test. Basically, when optimization failed, NaNs would be returned and generated some sort of dtype-related error. So I'd make sure that test passes when making changes to that function.

I don't actually ever see the as_tensor arg of NdarrayOptimizationClosure be used; seems this is always the default. Given that I would probably recommend just getting rid of that additional complexity

This makes sense to me.

AVHopp · 2024-10-29T07:48:44Z

@esantorella Regarding the first point: I'll try to design a solution for this in the currently existing PR #2597 and I think it makes sense to continue the discussion there.
For the second one, I'll also see if I can come up with a solution, either in the same PR or in another one (which I guess would be more favorable since this is a completely different part of the code?)

Balandat · 2024-10-29T13:22:11Z

For the second one, I'll also see if I can come up with a solution, either in the same PR or in another one (which I guess would be more favorable since this is a completely different part of the code?)

That would be great and yes this would best happen in a separate PR.

…2597) Summary: ## Motivation This PR replaces the hard-coded double precision that was used in the initialization of `test_functions/base.py` to use `torch.get_default_dtype()` instead. See #2596 for more details. ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes, I have read it. Pull Request resolved: #2597 Test Plan: I ran code formatting via `ufmt` and checked the code via `pytest -ra`. All tests related to the changes here were adjusted in the second commit of the branch. Locally, two tests failed for me, but it seems to me that these are not related to the fix implemented here. If it turns out they are, I'd be more than happy to further adjust. ## Related PRs None, but #2596 is related. Reviewed By: saitcakmak Differential Revision: D65066231 Pulled By: Balandat fbshipit-source-id: 4beac1fc9a1e5094fd4806958ac2441a12506eb7

Summary: ## Motivation This PR removes the `as_tensor` argument of the `set_tensors_from_ndarray_1d` function in `numpy_utils.py`. The reason for this change is that the previous implementation - messed up the float precision: That is, even when trying to use `float32` precision, this always transformed floats into `float64` precision - did only use the default function anyway. This change was discussed previously with Balandat and esantorella in #2596 . ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes, I have read it. Pull Request resolved: #2615 Test Plan: I checked the format using `ufmt format .` and verified that all tests are still running using `pytest -ra`. ## Related PRs This PR is related to #2597 and #2596, fixing a similar issue (`float32` being broken by some internal calculations). Reviewed By: Balandat Differential Revision: D65537491 Pulled By: saitcakmak fbshipit-source-id: f178a761691ce092a1b06d020158ea367fc6c99e

AVHopp added the bug Something isn't working label Oct 28, 2024

AVHopp mentioned this issue Oct 28, 2024

Fix hard-coded double precision in test_functions to default dtype #2597

Closed

esantorella changed the title ~~[Bug] Bounds in test_functions/base.py are hard-coded to double precision~~ [Bug] Implicit assumption of double precision can cause failures when single precision is used Nov 4, 2024

AVHopp mentioned this issue Nov 6, 2024

Remove as_tensor argument of set_tensors_from_ndarray_1d #2615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Implicit assumption of double precision can cause failures when single precision is used #2596

[Bug] Implicit assumption of double precision can cause failures when single precision is used #2596

AVHopp commented Oct 28, 2024

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024

AVHopp commented Oct 28, 2024

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024 •

edited

Loading

Balandat commented Oct 28, 2024

esantorella commented Oct 28, 2024

esantorella commented Oct 28, 2024

AVHopp commented Oct 29, 2024

Balandat commented Oct 29, 2024

[Bug] Implicit assumption of double precision can cause failures when single precision is used #2596

[Bug] Implicit assumption of double precision can cause failures when single precision is used #2596

Comments

AVHopp commented Oct 28, 2024

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024

AVHopp commented Oct 28, 2024

AVHopp commented Oct 28, 2024

Balandat commented Oct 28, 2024 • edited Loading

Balandat commented Oct 28, 2024

esantorella commented Oct 28, 2024

esantorella commented Oct 28, 2024

AVHopp commented Oct 29, 2024

Balandat commented Oct 29, 2024

Balandat commented Oct 28, 2024 •

edited

Loading