Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

Open
till-m opened this issue Oct 22, 2024 · 6 comments

Comments

@till-m
Copy link
Member

till-m commented Oct 22, 2024

Describe the bug
Internally, the target space sorts the keys of the pbounds dict before creating the target space bounds. This is presumably because python dicts were unordered before python 3.7 and ensured that there was a canonical order to the parameters. However, after python 3.7 dict keys are now ordered, making this sorting unnecessary. Moreover, the fact that the internal representation is changed can lead to unexpected behaviour when registering a numpy array (see example below). This seems very likely to happen when using a DataFrame representation of data.

To Reproduce

Ex:

from sklearn import datasets
import pandas as pd
from bayes_opt import BayesianOptimization

df = pd.DataFrame(datasets.load_iris().data, columns=datasets.load_iris().feature_names)
df.columns = [col.replace(' (cm)', '') for col in df.columns]
df.columns = [col.replace(' ', '_') for col in df.columns]

pbounds = {
    'sepal_length': (df['sepal_length'].min(), df['sepal_length'].max()),
    'sepal_width': (df['sepal_width'].min(), df['sepal_width'].max()),
    'petal_length': (df['petal_length'].min(), df['petal_length'].max()),
    'petal_width': (df['petal_width'].min(), df['petal_width'].max())
}
optimizer = BayesianOptimization(
    f=None,
    pbounds=pbounds,
    random_state=1,
)
optimizer.register(df.iloc[0].values, 0)
print(dict(optimizer.space.array_to_params(optimizer.space.params[0])))
print(dict(df.iloc[0]))

Output:

{'petal_length': np.float64(5.1), 'petal_width': np.float64(3.5), 'sepal_length': np.float64(1.4), 'sepal_width': np.float64(0.2)}
{'sepal_length': np.float64(5.1), 'sepal_width': np.float64(3.5), 'petal_length': np.float64(1.4), 'petal_width': np.float64(0.2)}

Expected behavior
The optimizer should not change the order of the keys provided by the user. Registering a numpy array instead of a dict should assume the same order as given by the pbounds.

@Nkehoe-QUB
Copy link

Nkehoe-QUB commented Oct 25, 2024

Hello, I believe this bug is affecting my usage of the package. I have the following instance of the Optimisation class and after the first iteration is seems to swap the second and third bounds, so the parameter 'thick' is meant to have bound of 0.1e-6 to 15.0e-6 and 'g' is meant to have 1.0 to 2.0 but they get swapped.

# Simulation Parameters min and max values
param_Min = [1.0, 0.1 * micron, 1.0]
param_Max = [20.0, 15.0 * micron, 2.0]

# Bayesian Optimization Class, random_state is the seed for the random number generator
acq = acquisition.UpperConfidenceBound(kappa=5.0)

optimizer = BayesianOptimization(
    f=None,
    acquisition_function=acq,
    pbounds={'den' : (param_Min[0], param_Max[0]),
             'thick' : (param_Min[1], param_Max[1]),
             'g' : (param_Min[2], param_Max[2])
             },
    verbose=2,
    random_state=1,
)

@till-m
Copy link
Member Author

till-m commented Oct 25, 2024

Hey @Nkehoe-QUB, if you give me a full example that reproduces the error I can have a look and see if this is really what happens.

@Nkehoe-QUB
Copy link

Hi, I'm sorry it turns out it's an issue how I'm handing the output of optimiser.suggest() and passing it to my simulation. I was doing:

next_points = optimizer.suggest()
    ID = str(Run_PIC(i, j, [values for values in next_points.values()]))

But I didn't realise that the keys of the suggest() output would be in alphabetical order so therefore it was passing it as den g and then thick.

@perezed00
Copy link
Contributor

I'm glad this has been noted. I agree that this is a wrinkle worth smoothing out. If nothing else, better compatibility with pandas seems worthwhile. I'll take a look and report back.

@bwheelz36
Copy link
Collaborator

bwheelz36 commented Nov 16, 2024

I do remember running into this and being tripped up by it. (example below 😝)

I agree it would be better if this didn't happen, but note that this would be a breaking change for many users of this library, who will be assuming that this library is sorting the parameters. so fixing this should be deferred until the next major release is planned in my opinion.

image

@till-m
Copy link
Member Author

till-m commented Nov 17, 2024

This should only affect people that register arrays, which hopefully is a small part of the userbase. One thing we could do is publish a minor version that raises a warning when someone registers an array, telling them that this operation will work differently in a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants