Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

till-m · 2024-10-22T08:48:17Z

Describe the bug
Internally, the target space sorts the keys of the pbounds dict before creating the target space bounds. This is presumably because python dicts were unordered before python 3.7 and ensured that there was a canonical order to the parameters. However, after python 3.7 dict keys are now ordered, making this sorting unnecessary. Moreover, the fact that the internal representation is changed can lead to unexpected behaviour when registering a numpy array (see example below). This seems very likely to happen when using a DataFrame representation of data.

To Reproduce

Ex:

from sklearn import datasets
import pandas as pd
from bayes_opt import BayesianOptimization

df = pd.DataFrame(datasets.load_iris().data, columns=datasets.load_iris().feature_names)
df.columns = [col.replace(' (cm)', '') for col in df.columns]
df.columns = [col.replace(' ', '_') for col in df.columns]

pbounds = {
    'sepal_length': (df['sepal_length'].min(), df['sepal_length'].max()),
    'sepal_width': (df['sepal_width'].min(), df['sepal_width'].max()),
    'petal_length': (df['petal_length'].min(), df['petal_length'].max()),
    'petal_width': (df['petal_width'].min(), df['petal_width'].max())
}
optimizer = BayesianOptimization(
    f=None,
    pbounds=pbounds,
    random_state=1,
)
optimizer.register(df.iloc[0].values, 0)
print(dict(optimizer.space.array_to_params(optimizer.space.params[0])))
print(dict(df.iloc[0]))

Output:

{'petal_length': np.float64(5.1), 'petal_width': np.float64(3.5), 'sepal_length': np.float64(1.4), 'sepal_width': np.float64(0.2)}
{'sepal_length': np.float64(5.1), 'sepal_width': np.float64(3.5), 'petal_length': np.float64(1.4), 'petal_width': np.float64(0.2)}

Expected behavior
The optimizer should not change the order of the keys provided by the user. Registering a numpy array instead of a dict should assume the same order as given by the pbounds.

The text was updated successfully, but these errors were encountered:

Nkehoe-QUB · 2024-10-25T10:26:00Z

Hello, I believe this bug is affecting my usage of the package. I have the following instance of the Optimisation class and after the first iteration is seems to swap the second and third bounds, so the parameter 'thick' is meant to have bound of 0.1e-6 to 15.0e-6 and 'g' is meant to have 1.0 to 2.0 but they get swapped.

# Simulation Parameters min and max values
param_Min = [1.0, 0.1 * micron, 1.0]
param_Max = [20.0, 15.0 * micron, 2.0]

# Bayesian Optimization Class, random_state is the seed for the random number generator
acq = acquisition.UpperConfidenceBound(kappa=5.0)

optimizer = BayesianOptimization(
    f=None,
    acquisition_function=acq,
    pbounds={'den' : (param_Min[0], param_Max[0]),
             'thick' : (param_Min[1], param_Max[1]),
             'g' : (param_Min[2], param_Max[2])
             },
    verbose=2,
    random_state=1,
)

till-m · 2024-10-25T11:20:32Z

Hey @Nkehoe-QUB, if you give me a full example that reproduces the error I can have a look and see if this is really what happens.

Nkehoe-QUB · 2024-10-25T12:08:27Z

Hi, I'm sorry it turns out it's an issue how I'm handing the output of optimiser.suggest() and passing it to my simulation. I was doing:

next_points = optimizer.suggest()
    ID = str(Run_PIC(i, j, [values for values in next_points.values()]))

But I didn't realise that the keys of the suggest() output would be in alphabetical order so therefore it was passing it as den g and then thick.

perezed00 · 2024-11-14T20:59:41Z

I'm glad this has been noted. I agree that this is a wrinkle worth smoothing out. If nothing else, better compatibility with pandas seems worthwhile. I'll take a look and report back.

bwheelz36 · 2024-11-16T08:18:14Z

I do remember running into this and being tripped up by it. (example below 😝)

I agree it would be better if this didn't happen, but note that this would be a breaking change for many users of this library, who will be assuming that this library is sorting the parameters. so fixing this should be deferred until the next major release is planned in my opinion.

till-m · 2024-11-17T14:11:56Z

This should only affect people that register arrays, which hopefully is a small part of the userbase. One thing we could do is publish a minor version that raises a warning when someone registers an array, telling them that this operation will work differently in a future release.

till-m added bug enhancement labels Oct 22, 2024

This was referenced Nov 24, 2024

deprecation warning for sorted parameters #532

Open

Typed Optimization #531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

till-m commented Oct 22, 2024

Nkehoe-QUB commented Oct 25, 2024 •

edited

Loading

till-m commented Oct 25, 2024

Nkehoe-QUB commented Oct 25, 2024

perezed00 commented Nov 14, 2024

bwheelz36 commented Nov 16, 2024 •

edited

Loading

till-m commented Nov 17, 2024 •

edited

Loading

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

Comments

till-m commented Oct 22, 2024

Nkehoe-QUB commented Oct 25, 2024 • edited Loading

till-m commented Oct 25, 2024

Nkehoe-QUB commented Oct 25, 2024

perezed00 commented Nov 14, 2024

bwheelz36 commented Nov 16, 2024 • edited Loading

till-m commented Nov 17, 2024 • edited Loading

Nkehoe-QUB commented Oct 25, 2024 •

edited

Loading

bwheelz36 commented Nov 16, 2024 •

edited

Loading

till-m commented Nov 17, 2024 •

edited

Loading